Duplicate detection - step 3: remove true duplicates¶
This notebook runs the third part of the duplicate detection algorithm on a dataframe with the following columns:
archiveType(used for duplicate detection algorithm)dataSetNamedatasetIdgeo_meanElev(used for duplicate detection algorithm)geo_meanLat(used for duplicate detection algorithm)geo_meanLon(used for duplicate detection algorithm)geo_siteName(used for duplicate detection algorithm)interpretation_directioninterpretation_seasonalityinterpretation_variableinterpretation_variableDetailsoriginalDataURLoriginalDatabasepaleoData_notespaleoData_proxy(used for duplicate detection algorithm)paleoData_unitspaleoData_values(used for duplicate detection algorithm, test for correlation, RMSE, correlation of 1st difference, RMSE of 1st difference)paleoData_variableNameyear(used for duplicate detection algorithm)yearUnits- This interactive notebook (
dup_removal.ipynb) removes the duplicates flagged indup_detection.ipynb, following the decisions made indup_decision.ipynb. The decisions include - removal of redundant duplicates
- creation of composites
Based on the operator decisions as specified in data/DATABASENAME/duplicate_detection/duplicate_decisions_DATABASENAME_AUTHORINITIALS_YY-MM-DD.csv.
Ultimately a duplicate free dataframe is saved under
data/DATABASENAME/DATABASENAME_dupfree.pkldata/DATABASENAME/DATABASENAME_dupfree_data.csvdata/DATABASENAME/DATABASENAME_dupfree_year.csvdata/DATABASENAME/DATABASENAME_dupfree_metadata.csv
10/11/2025 by LL: tidied up with revised data organisation and prepared for documentation 02/12/2024 by LL: Modified the compositing process for metadata to fix bugs and make it more user friendly. Added some extra information to the bottom of the file (prior to the figures).
22/10/2024 by LL: add the composite option for duplicates (create z-scores and average over shared time period) 30/09/2024 by LL: keep all original database values for removeed duplicates with more than one original database
Author: Lucie Luecke, created 27/9/2024
Intialisation¶
Set up working environment¶
Make sure the repo_root is added correctly, it should be: your_root_dir/dod2k This should be the working directory throughout this notebook (and all other notebooks).
%load_ext autoreload
%autoreload 2
import sys
import os
from pathlib import Path
# Add parent directory to path (works from any notebook in notebooks/)
# the repo_root should be the parent directory of the notebooks folder
current_dir = Path().resolve()
# Determine repo root
if current_dir.name == 'dod2k': repo_root = current_dir
elif current_dir.parent.name == 'dod2k': repo_root = current_dir.parent
else: raise Exception('Please review the repo root structure (see first cell).')
# Update cwd and path only if needed
if os.getcwd() != str(repo_root):
os.chdir(repo_root)
if str(repo_root) not in sys.path:
sys.path.insert(0, str(repo_root))
print(f"Repo root: {repo_root}")
if str(os.getcwd())==str(repo_root):
print(f"Working directory matches repo root. ")
Repo root: /home/jupyter-lluecke/dod2k_v2.0/dod2k Working directory matches repo root.
import pandas as pd
import numpy as np
import datetime
from dod2k_utilities import ut_functions as utf # contains utility functions
from dod2k_utilities import ut_duplicate_search as dup # contains utility functions
Load dataset¶
Define the dataset which needs to be screened for duplicates. Input files for the duplicate detection mechanism need to be compact dataframes (pandas dataframes with standardised columns and entry formatting).
The function load_compact_dataframe_from_csv loads the dataframe from a csv file from data\DB\, with DB the name of the database. The database name (db_name) can be
pages2kch2kiso2ksisalfe23
for the individual databases, or
all_merged
to load the merged database of all individual databases, or can be any user defined compact dataframe.
# load dataframe
db_name='all_merged'
# db_name='ch2k'
df = utf.load_compact_dataframe_from_csv(db_name)
print(df.info())
df.name = db_name
<class 'pandas.core.frame.DataFrame'> RangeIndex: 5879 entries, 0 to 5878 Data columns (total 21 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 archiveType 5879 non-null object 1 dataSetName 5879 non-null object 2 datasetId 5879 non-null object 3 geo_meanElev 5780 non-null float32 4 geo_meanLat 5879 non-null float32 5 geo_meanLon 5879 non-null float32 6 geo_siteName 5879 non-null object 7 interpretation_direction 5879 non-null object 8 interpretation_seasonality 5879 non-null object 9 interpretation_variable 5879 non-null object 10 interpretation_variableDetail 5879 non-null object 11 originalDataURL 5879 non-null object 12 originalDatabase 5879 non-null object 13 paleoData_notes 5879 non-null object 14 paleoData_proxy 5879 non-null object 15 paleoData_sensorSpecies 5879 non-null object 16 paleoData_units 5879 non-null object 17 paleoData_values 5879 non-null object 18 paleoData_variableName 5879 non-null object 19 year 5879 non-null object 20 yearUnits 5879 non-null object dtypes: float32(3), object(18) memory usage: 895.8+ KB None
Set datasetId as dataframe index to reliably identify the duplicates:
df.set_index('datasetId', inplace = True)
df['datasetId']=df.index
Input operator's credentials¶
In order to keep maximum transparency and reproduceability, put in the operator's credentials here.
These details are used to flag the intermediate output files and provided along with the final duplicate free dataset.
initials = 'LL'
fullname = 'Lucie Luecke'
email = 'ljluec1@st-andrews.ac.uk'
operator_details = [initials, fullname, email]
Apply duplicate decisions to dataframe¶
Load duplicate decisions from csv¶
Please specify the date of the decision process below. The decision output file is then loaded from data/DBNAME/dup_detection/dup_decisions_DBNAME_INITIALS_DATE.csv.
# date = str(datetime.datetime.utcnow())[2:10]
date='25-11-11'
filename = f'data/{df.name}/dup_detection/dup_decisions_{df.name}_{initials}_{date}'
data, header = dup.read_csv(filename, header=True)
df_decisions = pd.read_csv(filename+'.csv', header=5)
for hh in header:
print(hh)
print(df_decisions.columns)
print(df.name)
Decisions for duplicate candidate pairs.
Operated by Lucie Luecke (LL)
E-Mail: ljluec1@st-andrews.ac.uk
Created on: 2025-11-11 12:59:20.398076 (UTC)
test test test
Index(['index 1', 'index 2', 'figure path', 'datasetId 1', 'datasetId 2',
'originalDatabase 1', 'originalDatabase 2', 'geo_siteName 1',
'geo_siteName 2', 'geo_meanLat 1', 'geo_meanLat 2', 'geo_meanLon 1',
'geo_meanLon 2', 'geo_meanElevation 1', 'geo_meanElevation 2',
'archiveType 1', 'archiveType 2', 'paleoData_proxy 1',
'paleoData_proxy 2', 'originalDataURL 1', 'originalDataURL 2', 'year 1',
'year 2', 'Decision 1', 'Decision 2', 'Decision type',
'Decision comment'],
dtype='object')
all_merged
Save a list of all candidate IDs individually (not as pairs) and collect the associated decisions.
# all candidate IDs
candidate_IDs = list(df_decisions['datasetId 1'])
candidate_IDs += list(df_decisions['datasetId 2'])
candidate_IDs = np.unique(candidate_IDs)
# decisions
decisions = {}
for ind in df_decisions.index:
id1, id2 = df_decisions.loc[ind, ['datasetId 1', 'datasetId 2']]
dec1, dec2 = df_decisions.loc[ind, ['Decision 1', 'Decision 2']]
for id, dec in zip([id1, id2], [dec1, dec2]):
if id not in decisions: decisions[id] = []
decisions[id]+=[dec]
Save all the duplicate details in one dictionary, which will be used in the duplicate free dataframe (final output) df_dupfree to provide details on the duplicate detection process.
dup_details = dup.provide_dup_details(df_decisions, header)
Note that any one record can appear more than once and have multiple decisions associated with it (e.g. 'REMOVE', 'KEEP' or 'COMPOSITE').
In order to remove the duplicates we need to implement the following steps:
- Records to be REMOVED. Remove all records from the dataframe which are associated with the decision 'REMOVE' and save in
df_dupfree_rmv - Records to be COMPOSITED. Create compounds of the records and save in
df_composite - Now check for records which have both 'REMOVE' and 'COMPOSITE' associated. These are potentially remaining duplicates. Here, the operator is once again asked to make decisions and run a 'mini' version of the duplicate workflow.
1. Records to be REMOVED¶
First simply remove all the records to which the decision 'REMOVE' and/or 'COMPOSITE' applies to and store in df_dupfree_rmv, while all 'REMOVE' type records are stored in df_duplica_rmv (for later inspection).
# load the records TO BE REMOVED
remove_IDs = list(df_decisions['datasetId 1'][np.isin(df_decisions['Decision 1'],['REMOVE', 'COMPOSITE'])])
remove_IDs += list(df_decisions['datasetId 2'][np.isin(df_decisions['Decision 2'],['REMOVE', 'COMPOSITE'])])
remove_IDs = np.unique(remove_IDs)
df_duplica_rmv = df.loc[remove_IDs] # df containing only records which were removed
df_dupfree_rmv = df.drop(remove_IDs) # df freed from 'REMOVE' type duplicates
print(f'Removed {len(df_duplica_rmv)} REMOVE type records.')
print(f'REMOVE type duplicate free dataset contains {len(df_dupfree_rmv)} records.')
print('Removed the following IDs:', remove_IDs)
print(df.name)
Removed 472 REMOVE type records. REMOVE type duplicate free dataset contains 5407 records. Removed the following IDs: ['FE23_asia_mong006' 'FE23_asia_mong007w' 'FE23_asia_mong011' 'FE23_asia_mong012' 'FE23_asia_russ127w' 'FE23_asia_russ130w' 'FE23_asia_russ137w' 'FE23_australia_newz003' 'FE23_australia_newz008' 'FE23_australia_newz014' 'FE23_australia_newz018' 'FE23_australia_newz019' 'FE23_australia_newz092' 'FE23_europe_swed019w' 'FE23_europe_swed021w' 'FE23_northamerica_canada_cana002' 'FE23_northamerica_canada_cana029' 'FE23_northamerica_canada_cana062' 'FE23_northamerica_canada_cana091' 'FE23_northamerica_canada_cana094' 'FE23_northamerica_canada_cana096' 'FE23_northamerica_canada_cana097' 'FE23_northamerica_canada_cana100' 'FE23_northamerica_canada_cana105' 'FE23_northamerica_canada_cana111' 'FE23_northamerica_canada_cana113' 'FE23_northamerica_canada_cana153' 'FE23_northamerica_canada_cana155' 'FE23_northamerica_canada_cana162' 'FE23_northamerica_canada_cana168w' 'FE23_northamerica_canada_cana169w' 'FE23_northamerica_canada_cana170w' 'FE23_northamerica_canada_cana210' 'FE23_northamerica_canada_cana225' 'FE23_northamerica_canada_cana231' 'FE23_northamerica_canada_cana234' 'FE23_northamerica_canada_cana238' 'FE23_northamerica_mexico_mexi020' 'FE23_northamerica_mexico_mexi022' 'FE23_northamerica_mexico_mexi023' 'FE23_northamerica_mexico_mexi043' 'FE23_northamerica_usa_ak010' 'FE23_northamerica_usa_ak014' 'FE23_northamerica_usa_ak021' 'FE23_northamerica_usa_ak046' 'FE23_northamerica_usa_ak057' 'FE23_northamerica_usa_ak058' 'FE23_northamerica_usa_ak060' 'FE23_northamerica_usa_ak070' 'FE23_northamerica_usa_ak094' 'FE23_northamerica_usa_ak6' 'FE23_northamerica_usa_az553' 'FE23_northamerica_usa_az555' 'FE23_northamerica_usa_ca066' 'FE23_northamerica_usa_ca067' 'FE23_northamerica_usa_ca512' 'FE23_northamerica_usa_ca535' 'FE23_northamerica_usa_ca560' 'FE23_northamerica_usa_ca603' 'FE23_northamerica_usa_ca606' 'FE23_northamerica_usa_ca609' 'FE23_northamerica_usa_ca613' 'FE23_northamerica_usa_co552' 'FE23_northamerica_usa_co553' 'FE23_northamerica_usa_co554' 'FE23_northamerica_usa_co586' 'FE23_northamerica_usa_co633' 'FE23_northamerica_usa_id008' 'FE23_northamerica_usa_id013' 'FE23_northamerica_usa_me010' 'FE23_northamerica_usa_me017' 'FE23_northamerica_usa_me018' 'FE23_northamerica_usa_mo009' 'FE23_northamerica_usa_mt108' 'FE23_northamerica_usa_mt112' 'FE23_northamerica_usa_mt113' 'FE23_northamerica_usa_mt116' 'FE23_northamerica_usa_nj001' 'FE23_northamerica_usa_nj002' 'FE23_northamerica_usa_nm055' 'FE23_northamerica_usa_nv060' 'FE23_northamerica_usa_nv512' 'FE23_northamerica_usa_nv513' 'FE23_northamerica_usa_or042' 'FE23_northamerica_usa_or043' 'FE23_northamerica_usa_ut511' 'FE23_northamerica_usa_wa069' 'FE23_northamerica_usa_wa071' 'FE23_northamerica_usa_wa072' 'FE23_northamerica_usa_wa081' 'FE23_northamerica_usa_wa083' 'FE23_northamerica_usa_wa097' 'FE23_northamerica_usa_wa104' 'FE23_northamerica_usa_wy021' 'FE23_northamerica_usa_wy022' 'FE23_northamerica_usa_wy023' 'FE23_northamerica_usa_wy024' 'FE23_northamerica_usa_wy025' 'FE23_northamerica_usa_wy030' 'FE23_southamerica_arge085' 'FE23_southamerica_chil017' 'ch2k_AS05GUA01_302' 'ch2k_BA04FIJ01_558' 'ch2k_BA04FIJ02_382' 'ch2k_CA07FLI01_400' 'ch2k_CA13SAP01_188' 'ch2k_CA14TIM01_64' 'ch2k_CH98PIR01_116' 'ch2k_CO00MAL01_412' 'ch2k_CO03PAL01_110' 'ch2k_CO03PAL02_8' 'ch2k_CO03PAL03_6' 'ch2k_CO03PAL04_452' 'ch2k_CO03PAL05_212' 'ch2k_CO03PAL06_386' 'ch2k_CO03PAL07_14' 'ch2k_CO03PAL08_472' 'ch2k_CO03PAL09_358' 'ch2k_CO03PAL10_324' 'ch2k_CO93TAR01_408' 'ch2k_DA06MAF01_78' 'ch2k_DA06MAF02_104' 'ch2k_DE13HAI01_424' 'ch2k_DE13HAI01_430' 'ch2k_DE13HAI01_432' 'ch2k_DR99ABR01_264' 'ch2k_DR99ABR01_266' 'ch2k_DU94URV01_470' 'ch2k_EV18ROC01_186' 'ch2k_FE09OGA01_304' 'ch2k_FE18RUS01_492' 'ch2k_FL18DTO02_554' 'ch2k_GO12SBV01_396' 'ch2k_GU99NAU01_314' 'ch2k_HE08LRA01_76' 'ch2k_HE10GUA01_244' 'ch2k_HE13MIS01_194' 'ch2k_KI04MCV01_366' 'ch2k_KI14PAR01_516' 'ch2k_KI14PAR01_518' 'ch2k_KU00NIN01_150' 'ch2k_KU99HOU01_40' 'ch2k_LI06FIJ01_582' 'ch2k_LI06RAR01_12' 'ch2k_LI06RAR02_270' 'ch2k_LI94SEC01_436' 'ch2k_LI99CLI01_486' 'ch2k_MO06PED01_226' 'ch2k_NA09MAL01_84' 'ch2k_NU11PAL01_52' 'ch2k_OS14UCP01_236' 'ch2k_PF04PBA01_204' 'ch2k_QU06RAB01_144' 'ch2k_QU96ESV01_422' 'ch2k_RE18CAY01_30' 'ch2k_RO19YUC01_340' 'ch2k_SW98STP01_86' 'ch2k_TU01DEP01_450' 'ch2k_TU95MAD01_24' 'ch2k_UR00MAI01_22' 'ch2k_WU13TON01_506' 'ch2k_XI17HAI01_128' 'ch2k_XI17HAI01_134' 'ch2k_XI17HAI01_136' 'ch2k_ZI04IFR01_26' 'ch2k_ZI14IFR02_524' 'ch2k_ZI14TUR01_480' 'ch2k_ZI14TUR01_482' 'ch2k_ZI15BUN01_490' 'ch2k_ZI15CLE01_440' 'ch2k_ZI15IMP01_330' 'ch2k_ZI15IMP02_202' 'ch2k_ZI15MER01_4' 'ch2k_ZI15TAN01_280' 'iso2k_1286' 'iso2k_1554' 'iso2k_1556' 'iso2k_1704' 'iso2k_1817' 'iso2k_1851' 'iso2k_1855' 'iso2k_298' 'iso2k_299' 'iso2k_404' 'iso2k_505' 'iso2k_549' 'iso2k_550' 'iso2k_579' 'iso2k_58' 'iso2k_702' 'iso2k_775' 'iso2k_786' 'iso2k_788' 'iso2k_806' 'iso2k_811' 'iso2k_98' 'pages2k_0' 'pages2k_1003' 'pages2k_1004' 'pages2k_1047' 'pages2k_1048' 'pages2k_1076' 'pages2k_1108' 'pages2k_1137' 'pages2k_1147' 'pages2k_1156' 'pages2k_1159' 'pages2k_1160' 'pages2k_1188' 'pages2k_122' 'pages2k_1230' 'pages2k_1273' 'pages2k_1274' 'pages2k_1293' 'pages2k_13' 'pages2k_1360' 'pages2k_1364' 'pages2k_1365' 'pages2k_1370' 'pages2k_139' 'pages2k_140' 'pages2k_1441' 'pages2k_1444' 'pages2k_1470' 'pages2k_1471' 'pages2k_1486' 'pages2k_1488' 'pages2k_1490' 'pages2k_1491' 'pages2k_1497' 'pages2k_1518' 'pages2k_1519' 'pages2k_152' 'pages2k_1520' 'pages2k_153' 'pages2k_1547' 'pages2k_1587' 'pages2k_1618' 'pages2k_1619' 'pages2k_1643' 'pages2k_1656' 'pages2k_1657' 'pages2k_166' 'pages2k_1688' 'pages2k_1703' 'pages2k_1712' 'pages2k_1720' 'pages2k_1750' 'pages2k_1771' 'pages2k_1824' 'pages2k_1825' 'pages2k_1859' 'pages2k_1861' 'pages2k_1888' 'pages2k_1891' 'pages2k_1918' 'pages2k_192' 'pages2k_1922' 'pages2k_1923' 'pages2k_1932' 'pages2k_1942' 'pages2k_1973' 'pages2k_1976' 'pages2k_1978' 'pages2k_1985' 'pages2k_1991' 'pages2k_1992' 'pages2k_1993' 'pages2k_1994' 'pages2k_203' 'pages2k_2034' 'pages2k_2042' 'pages2k_2085' 'pages2k_2094' 'pages2k_2131' 'pages2k_2149' 'pages2k_2150' 'pages2k_2177' 'pages2k_2203' 'pages2k_2214' 'pages2k_2220' 'pages2k_2273' 'pages2k_2290' 'pages2k_2300' 'pages2k_2309' 'pages2k_2311' 'pages2k_2338' 'pages2k_2339' 'pages2k_2343' 'pages2k_2344' 'pages2k_238' 'pages2k_242' 'pages2k_2451' 'pages2k_2477' 'pages2k_2480' 'pages2k_2490' 'pages2k_2493' 'pages2k_2494' 'pages2k_2502' 'pages2k_2510' 'pages2k_2514' 'pages2k_2517' 'pages2k_2534' 'pages2k_2538' 'pages2k_258' 'pages2k_2582' 'pages2k_2592' 'pages2k_2594' 'pages2k_2595' 'pages2k_2598' 'pages2k_26' 'pages2k_2604' 'pages2k_2606' 'pages2k_2609' 'pages2k_2612' 'pages2k_2613' 'pages2k_2617' 'pages2k_263' 'pages2k_2642' 'pages2k_2655' 'pages2k_267' 'pages2k_2684' 'pages2k_2697' 'pages2k_2698' 'pages2k_27' 'pages2k_271' 'pages2k_2730' 'pages2k_2743' 'pages2k_2750' 'pages2k_2755' 'pages2k_2758' 'pages2k_2759' 'pages2k_2795' 'pages2k_2798' 'pages2k_2830' 'pages2k_2843' 'pages2k_2864' 'pages2k_2899' 'pages2k_2901' 'pages2k_2904' 'pages2k_2906' 'pages2k_2922' 'pages2k_2953' 'pages2k_2959' 'pages2k_2976' 'pages2k_2983' 'pages2k_2996' 'pages2k_2997' 'pages2k_3002' 'pages2k_3023' 'pages2k_3030' 'pages2k_3033' 'pages2k_3038' 'pages2k_3045' 'pages2k_305' 'pages2k_3058' 'pages2k_3059' 'pages2k_3064' 'pages2k_3068' 'pages2k_307' 'pages2k_3085' 'pages2k_3107' 'pages2k_3108' 'pages2k_3129' 'pages2k_3132' 'pages2k_3134' 'pages2k_315' 'pages2k_317' 'pages2k_3170' 'pages2k_3179' 'pages2k_3187' 'pages2k_3191' 'pages2k_3196' 'pages2k_3202' 'pages2k_3236' 'pages2k_3239' 'pages2k_3243' 'pages2k_3263' 'pages2k_3266' 'pages2k_3307' 'pages2k_3313' 'pages2k_3334' 'pages2k_3342' 'pages2k_3352' 'pages2k_3372' 'pages2k_3374' 'pages2k_3404' 'pages2k_3419' 'pages2k_3473' 'pages2k_3503' 'pages2k_3524' 'pages2k_3531' 'pages2k_3544' 'pages2k_3545' 'pages2k_3550' 'pages2k_3552' 'pages2k_3554' 'pages2k_3571' 'pages2k_3583' 'pages2k_3599' 'pages2k_3613' 'pages2k_3626' 'pages2k_3629' 'pages2k_3630' 'pages2k_3631' 'pages2k_3642' 'pages2k_366' 'pages2k_3663' 'pages2k_385' 'pages2k_387' 'pages2k_395' 'pages2k_397' 'pages2k_409' 'pages2k_418' 'pages2k_420' 'pages2k_421' 'pages2k_427' 'pages2k_433' 'pages2k_435' 'pages2k_445' 'pages2k_446' 'pages2k_462' 'pages2k_468' 'pages2k_474' 'pages2k_478' 'pages2k_495' 'pages2k_500' 'pages2k_565' 'pages2k_592' 'pages2k_610' 'pages2k_634' 'pages2k_698' 'pages2k_71' 'pages2k_711' 'pages2k_712' 'pages2k_730' 'pages2k_757' 'pages2k_81' 'pages2k_814' 'pages2k_818' 'pages2k_83' 'pages2k_830' 'pages2k_842' 'pages2k_878' 'pages2k_88' 'pages2k_881' 'pages2k_895' 'pages2k_900' 'pages2k_940' 'pages2k_945' 'pages2k_960' 'pages2k_976' 'sisal_113.0_66' 'sisal_115.0_69' 'sisal_201.0_133' 'sisal_205.0_141' 'sisal_253.0_171' 'sisal_271.0_174' 'sisal_272.0_177' 'sisal_273.0_179' 'sisal_278.0_184' 'sisal_294.0_194' 'sisal_305.0_199' 'sisal_329.0_213' 'sisal_330.0_215' 'sisal_446.0_292' 'sisal_47.0_21' 'sisal_47.0_22' 'sisal_47.0_23' 'sisal_471.0_314' 'sisal_896.0_531' 'sisal_896.0_533'] all_merged
# add columns on decision process to df_dupfree:
df_dupfree_rmv['duplicateDetails']='N/A'
for ID in dup_details:
if ID in df_dupfree_rmv.index:
if df_dupfree_rmv.at[ID, 'duplicateDetails']=='N/A':
df_dupfree_rmv.at[ID, 'duplicateDetails']=dup_details[ID]
else: df_dupfree_rmv.at[ID, 'duplicateDetails']+=dup_details[ID]
# df_dupfree_rmv[df_dupfree_rmv[ 'duplicateDetails']!='N/A'].at['ch2k_DE14DTO03_140', 'duplicateDetails']
2. Records to be COMPOSITED¶
Now identify all the records to which the decision 'COMPOSITE' applies to, create composites and store in df_composite.
# add the column 'duplicateDetails' to df, in case it does not exist
if 'duplicateDetails' not in df.columns: df['duplicateDetails']='N/A'
# load the records to be composited
comp_ID_pairs = df_decisions[(df_decisions['Decision 1']=='COMPOSITE')&(df_decisions['Decision 2']=='COMPOSITE')]
# create new composite data and metadata from the pairs
# loop through the composite pairs and check metadata
df_composite = dup.join_composites_metadata(df, comp_ID_pairs, df_decisions, header)
pages2k_427 pages2k_433 Add the following note to duplicateDetails: archiveType: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identicalgeo_siteName: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identicalgeo_siteName: Metadata identicalpaleoData_proxy: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identicalgeo_siteName: Metadata identicalpaleoData_proxy: Metadata identicalyearUnits: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identicalgeo_siteName: Metadata identicalpaleoData_proxy: Metadata identicalyearUnits: Metadata identicalinterpretation_variable: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identicalgeo_siteName: Metadata identicalpaleoData_proxy: Metadata identicalyearUnits: Metadata identicalinterpretation_variable: Metadata identicalinterpretation_direction: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identicalgeo_siteName: Metadata identicalpaleoData_proxy: Metadata identicalyearUnits: Metadata identicalinterpretation_variable: Metadata identicalinterpretation_direction: Metadata identicalinterpretation_seasonality: Metadata identical
saved figure in /figs//all_merged/dup_detection//composite_pages2k_427_pages2k_433.pdf pages2k_435 pages2k_842 Add the following note to duplicateDetails: archiveType: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identicalgeo_siteName: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identicalgeo_siteName: Metadata identicalpaleoData_proxy: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identicalgeo_siteName: Metadata identicalpaleoData_proxy: Metadata identicalyearUnits: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identicalgeo_siteName: Metadata identicalpaleoData_proxy: Metadata identicalyearUnits: Metadata identicalinterpretation_variable: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identicalgeo_siteName: Metadata identicalpaleoData_proxy: Metadata identicalyearUnits: Metadata identicalinterpretation_variable: Metadata identicalinterpretation_direction: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identicalgeo_siteName: Metadata identicalpaleoData_proxy: Metadata identicalyearUnits: Metadata identicalinterpretation_variable: Metadata identicalinterpretation_direction: Metadata identicalinterpretation_seasonality: Metadata identical
saved figure in /figs//all_merged/dup_detection//composite_pages2k_435_pages2k_842.pdf pages2k_468 pages2k_3550 Add the following note to duplicateDetails: archiveType: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identicalgeo_siteName: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identicalgeo_siteName: Metadata identicalpaleoData_proxy: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identicalgeo_siteName: Metadata identicalpaleoData_proxy: Metadata identicalyearUnits: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identicalgeo_siteName: Metadata identicalpaleoData_proxy: Metadata identicalyearUnits: Metadata identicalinterpretation_variable: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identicalgeo_siteName: Metadata identicalpaleoData_proxy: Metadata identicalyearUnits: Metadata identicalinterpretation_variable: Metadata identicalinterpretation_direction: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identicalgeo_siteName: Metadata identicalpaleoData_proxy: Metadata identicalyearUnits: Metadata identicalinterpretation_variable: Metadata identicalinterpretation_direction: Metadata identicalinterpretation_seasonality: Metadata identical
saved figure in /figs//all_merged/dup_detection//composite_pages2k_468_pages2k_3550.pdf pages2k_2085 FE23_northamerica_canada_cana002 Add the following note to duplicateDetails: archiveType: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identical -------------------------------------------------------------------------------- Metadata different for >>>geo_meanLat<<< in: pages2k_2085 (58.4) and FE23_northamerica_canada_cana002 (58.366665). Add the following note to duplicateDetails: Metadata differs for geo_meanLat in original records: pages2k_2085 (58.4) and FE23_northamerica_canada_cana002 (58.366665). geo_meanLat: Metadata averaged to: 58.38333 -------------------------------------------------------------------------------- Metadata different for >>>geo_meanLon<<< in: pages2k_2085 (-68.4) and FE23_northamerica_canada_cana002 (-68.38333). Add the following note to duplicateDetails: Metadata differs for geo_meanLon in original records: pages2k_2085 (-68.4) and FE23_northamerica_canada_cana002 (-68.38333). geo_meanLon: Metadata averaged to: -68.39166 -------------------------------------------------------------------------------- Metadata different for >>>geo_siteName<<< in: pages2k_2085 (Fort Chimo (Merged)) and FE23_northamerica_canada_cana002 (FortChimo(Merged)). Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: pages2k_2085 (Fort Chimo (Merged)) and FE23_northamerica_canada_cana002 (FortChimo(Merged)). geo_siteName: Metadata composited to: COMPOSITE: Fort Chimo (Merged) + FortChimo(Merged) Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: pages2k_2085 (Fort Chimo (Merged)) and FE23_northamerica_canada_cana002 (FortChimo(Merged)). geo_siteName: Metadata composited to: COMPOSITE: Fort Chimo (Merged) + FortChimo(Merged)paleoData_proxy: Metadata identical Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: pages2k_2085 (Fort Chimo (Merged)) and FE23_northamerica_canada_cana002 (FortChimo(Merged)). geo_siteName: Metadata composited to: COMPOSITE: Fort Chimo (Merged) + FortChimo(Merged)paleoData_proxy: Metadata identicalyearUnits: Metadata identical -------------------------------------------------------------------------------- Metadata different for >>>interpretation_variable<<< in: pages2k_2085 (temperature) and FE23_northamerica_canada_cana002 (NOT temperature NOT moisture). Add the following note to duplicateDetails: Metadata differs for interpretation_variable in original records: pages2k_2085 (temperature) and FE23_northamerica_canada_cana002 (NOT temperature NOT moisture). interpretation_variable: Metadata composited to: COMPOSITE: temperature + NOT temperature NOT moisture -------------------------------------------------------------------------------- Metadata different for >>>interpretation_direction<<< in: pages2k_2085 (positive) and FE23_northamerica_canada_cana002 (N/A). Add the following note to duplicateDetails: Metadata differs for interpretation_direction in original records: pages2k_2085 (positive) and FE23_northamerica_canada_cana002 (N/A). interpretation_direction: Metadata composited to: COMPOSITE: positive + N/A -------------------------------------------------------------------------------- Metadata different for >>>interpretation_seasonality<<< in: pages2k_2085 (Summer) and FE23_northamerica_canada_cana002 (N/A). Add the following note to duplicateDetails: Metadata differs for interpretation_seasonality in original records: pages2k_2085 (Summer) and FE23_northamerica_canada_cana002 (N/A). interpretation_seasonality: Metadata composited to: COMPOSITE: Summer + N/A
saved figure in /figs//all_merged/dup_detection//composite_pages2k_2085_FE23_northamerica_canada_cana002.pdf pages2k_2339 pages2k_2344 Add the following note to duplicateDetails: archiveType: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identicalgeo_siteName: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identicalgeo_siteName: Metadata identicalpaleoData_proxy: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identicalgeo_siteName: Metadata identicalpaleoData_proxy: Metadata identicalyearUnits: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identicalgeo_siteName: Metadata identicalpaleoData_proxy: Metadata identicalyearUnits: Metadata identicalinterpretation_variable: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identicalgeo_siteName: Metadata identicalpaleoData_proxy: Metadata identicalyearUnits: Metadata identicalinterpretation_variable: Metadata identicalinterpretation_direction: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identicalgeo_siteName: Metadata identicalpaleoData_proxy: Metadata identicalyearUnits: Metadata identicalinterpretation_variable: Metadata identicalinterpretation_direction: Metadata identicalinterpretation_seasonality: Metadata identical
saved figure in /figs//all_merged/dup_detection//composite_pages2k_2339_pages2k_2344.pdf pages2k_2795 pages2k_2798 Add the following note to duplicateDetails: archiveType: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identicalgeo_siteName: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identicalgeo_siteName: Metadata identicalpaleoData_proxy: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identicalgeo_siteName: Metadata identicalpaleoData_proxy: Metadata identicalyearUnits: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identicalgeo_siteName: Metadata identicalpaleoData_proxy: Metadata identicalyearUnits: Metadata identicalinterpretation_variable: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identicalgeo_siteName: Metadata identicalpaleoData_proxy: Metadata identicalyearUnits: Metadata identicalinterpretation_variable: Metadata identicalinterpretation_direction: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identicalgeo_siteName: Metadata identicalpaleoData_proxy: Metadata identicalyearUnits: Metadata identicalinterpretation_variable: Metadata identicalinterpretation_direction: Metadata identicalinterpretation_seasonality: Metadata identical
saved figure in /figs//all_merged/dup_detection//composite_pages2k_2795_pages2k_2798.pdf pages2k_2830 FE23_northamerica_mexico_mexi020 Add the following note to duplicateDetails: archiveType: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identical -------------------------------------------------------------------------------- Metadata different for >>>geo_meanLat<<< in: pages2k_2830 (31.0) and FE23_northamerica_mexico_mexi020 (30.966667). Add the following note to duplicateDetails: Metadata differs for geo_meanLat in original records: pages2k_2830 (31.0) and FE23_northamerica_mexico_mexi020 (30.966667). geo_meanLat: Metadata averaged to: 30.983334 Add the following note to duplicateDetails: Metadata differs for geo_meanLat in original records: pages2k_2830 (31.0) and FE23_northamerica_mexico_mexi020 (30.966667). geo_meanLat: Metadata averaged to: 30.983334geo_meanLon: Metadata identical -------------------------------------------------------------------------------- Metadata different for >>>geo_siteName<<< in: pages2k_2830 (La Tasajera (San Pedro Martir)) and FE23_northamerica_mexico_mexi020 (LaTasajera(SanPedroMartir)). Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: pages2k_2830 (La Tasajera (San Pedro Martir)) and FE23_northamerica_mexico_mexi020 (LaTasajera(SanPedroMartir)). geo_siteName: Metadata composited to: COMPOSITE: La Tasajera (San Pedro Martir) + LaTasajera(SanPedroMartir) Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: pages2k_2830 (La Tasajera (San Pedro Martir)) and FE23_northamerica_mexico_mexi020 (LaTasajera(SanPedroMartir)). geo_siteName: Metadata composited to: COMPOSITE: La Tasajera (San Pedro Martir) + LaTasajera(SanPedroMartir)paleoData_proxy: Metadata identical Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: pages2k_2830 (La Tasajera (San Pedro Martir)) and FE23_northamerica_mexico_mexi020 (LaTasajera(SanPedroMartir)). geo_siteName: Metadata composited to: COMPOSITE: La Tasajera (San Pedro Martir) + LaTasajera(SanPedroMartir)paleoData_proxy: Metadata identicalyearUnits: Metadata identical -------------------------------------------------------------------------------- Metadata different for >>>interpretation_variable<<< in: pages2k_2830 (temperature) and FE23_northamerica_mexico_mexi020 (moisture). Add the following note to duplicateDetails: Metadata differs for interpretation_variable in original records: pages2k_2830 (temperature) and FE23_northamerica_mexico_mexi020 (moisture). interpretation_variable: Metadata composited to: COMPOSITE: temperature + moisture -------------------------------------------------------------------------------- Metadata different for >>>interpretation_direction<<< in: pages2k_2830 (positive) and FE23_northamerica_mexico_mexi020 (N/A). Add the following note to duplicateDetails: Metadata differs for interpretation_direction in original records: pages2k_2830 (positive) and FE23_northamerica_mexico_mexi020 (N/A). interpretation_direction: Metadata composited to: COMPOSITE: positive + N/A -------------------------------------------------------------------------------- Metadata different for >>>interpretation_seasonality<<< in: pages2k_2830 (Summer) and FE23_northamerica_mexico_mexi020 (N/A). Add the following note to duplicateDetails: Metadata differs for interpretation_seasonality in original records: pages2k_2830 (Summer) and FE23_northamerica_mexico_mexi020 (N/A). interpretation_seasonality: Metadata composited to: COMPOSITE: Summer + N/A
saved figure in /figs//all_merged/dup_detection//composite_pages2k_2830_FE23_northamerica_mexico_mexi020.pdf pages2k_2843 FE23_northamerica_usa_wa083 Add the following note to duplicateDetails: archiveType: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identical -------------------------------------------------------------------------------- Metadata different for >>>geo_meanLon<<< in: pages2k_2843 (-118.3) and FE23_northamerica_usa_wa083 (-118.333336). Add the following note to duplicateDetails: Metadata differs for geo_meanLon in original records: pages2k_2843 (-118.3) and FE23_northamerica_usa_wa083 (-118.333336). geo_meanLon: Metadata averaged to: -118.316666 -------------------------------------------------------------------------------- Metadata different for >>>geo_siteName<<< in: pages2k_2843 (Sherman Creek Pass) and FE23_northamerica_usa_wa083 (ShermanCreekPass). Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: pages2k_2843 (Sherman Creek Pass) and FE23_northamerica_usa_wa083 (ShermanCreekPass). geo_siteName: Metadata composited to: COMPOSITE: Sherman Creek Pass + ShermanCreekPass Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: pages2k_2843 (Sherman Creek Pass) and FE23_northamerica_usa_wa083 (ShermanCreekPass). geo_siteName: Metadata composited to: COMPOSITE: Sherman Creek Pass + ShermanCreekPasspaleoData_proxy: Metadata identical Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: pages2k_2843 (Sherman Creek Pass) and FE23_northamerica_usa_wa083 (ShermanCreekPass). geo_siteName: Metadata composited to: COMPOSITE: Sherman Creek Pass + ShermanCreekPasspaleoData_proxy: Metadata identicalyearUnits: Metadata identical Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: pages2k_2843 (Sherman Creek Pass) and FE23_northamerica_usa_wa083 (ShermanCreekPass). geo_siteName: Metadata composited to: COMPOSITE: Sherman Creek Pass + ShermanCreekPasspaleoData_proxy: Metadata identicalyearUnits: Metadata identicalinterpretation_variable: Metadata identical -------------------------------------------------------------------------------- Metadata different for >>>interpretation_direction<<< in: pages2k_2843 (positive) and FE23_northamerica_usa_wa083 (N/A). Add the following note to duplicateDetails: Metadata differs for interpretation_direction in original records: pages2k_2843 (positive) and FE23_northamerica_usa_wa083 (N/A). interpretation_direction: Metadata composited to: COMPOSITE: positive + N/A -------------------------------------------------------------------------------- Metadata different for >>>interpretation_seasonality<<< in: pages2k_2843 (Summer) and FE23_northamerica_usa_wa083 (N/A). Add the following note to duplicateDetails: Metadata differs for interpretation_seasonality in original records: pages2k_2843 (Summer) and FE23_northamerica_usa_wa083 (N/A). interpretation_seasonality: Metadata composited to: COMPOSITE: Summer + N/A
saved figure in /figs//all_merged/dup_detection//composite_pages2k_2843_FE23_northamerica_usa_wa083.pdf pages2k_2899 pages2k_2901 Add the following note to duplicateDetails: archiveType: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identicalgeo_siteName: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identicalgeo_siteName: Metadata identicalpaleoData_proxy: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identicalgeo_siteName: Metadata identicalpaleoData_proxy: Metadata identicalyearUnits: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identicalgeo_siteName: Metadata identicalpaleoData_proxy: Metadata identicalyearUnits: Metadata identicalinterpretation_variable: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identicalgeo_siteName: Metadata identicalpaleoData_proxy: Metadata identicalyearUnits: Metadata identicalinterpretation_variable: Metadata identicalinterpretation_direction: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identicalgeo_siteName: Metadata identicalpaleoData_proxy: Metadata identicalyearUnits: Metadata identicalinterpretation_variable: Metadata identicalinterpretation_direction: Metadata identicalinterpretation_seasonality: Metadata identical
saved figure in /figs//all_merged/dup_detection//composite_pages2k_2899_pages2k_2901.pdf pages2k_2904 pages2k_2906 Add the following note to duplicateDetails: archiveType: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identicalgeo_siteName: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identicalgeo_siteName: Metadata identicalpaleoData_proxy: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identicalgeo_siteName: Metadata identicalpaleoData_proxy: Metadata identicalyearUnits: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identicalgeo_siteName: Metadata identicalpaleoData_proxy: Metadata identicalyearUnits: Metadata identicalinterpretation_variable: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identicalgeo_siteName: Metadata identicalpaleoData_proxy: Metadata identicalyearUnits: Metadata identicalinterpretation_variable: Metadata identicalinterpretation_direction: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identicalgeo_siteName: Metadata identicalpaleoData_proxy: Metadata identicalyearUnits: Metadata identicalinterpretation_variable: Metadata identicalinterpretation_direction: Metadata identicalinterpretation_seasonality: Metadata identical
saved figure in /figs//all_merged/dup_detection//composite_pages2k_2904_pages2k_2906.pdf pages2k_2922 FE23_northamerica_usa_ca603 Add the following note to duplicateDetails: archiveType: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identical -------------------------------------------------------------------------------- Metadata different for >>>geo_meanLat<<< in: pages2k_2922 (37.9) and FE23_northamerica_usa_ca603 (37.916668). Add the following note to duplicateDetails: Metadata differs for geo_meanLat in original records: pages2k_2922 (37.9) and FE23_northamerica_usa_ca603 (37.916668). geo_meanLat: Metadata averaged to: 37.908333 -------------------------------------------------------------------------------- Metadata different for >>>geo_meanLon<<< in: pages2k_2922 (-119.2) and FE23_northamerica_usa_ca603 (-119.23333). Add the following note to duplicateDetails: Metadata differs for geo_meanLon in original records: pages2k_2922 (-119.2) and FE23_northamerica_usa_ca603 (-119.23333). geo_meanLon: Metadata averaged to: -119.21666 -------------------------------------------------------------------------------- Metadata different for >>>geo_siteName<<< in: pages2k_2922 (Dana Plateau Inyo National Forest) and FE23_northamerica_usa_ca603 (DanaPlateauInyoNationalForest). Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: pages2k_2922 (Dana Plateau Inyo National Forest) and FE23_northamerica_usa_ca603 (DanaPlateauInyoNationalForest). geo_siteName: Metadata composited to: COMPOSITE: Dana Plateau Inyo National Forest + DanaPlateauInyoNationalForest Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: pages2k_2922 (Dana Plateau Inyo National Forest) and FE23_northamerica_usa_ca603 (DanaPlateauInyoNationalForest). geo_siteName: Metadata composited to: COMPOSITE: Dana Plateau Inyo National Forest + DanaPlateauInyoNationalForestpaleoData_proxy: Metadata identical Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: pages2k_2922 (Dana Plateau Inyo National Forest) and FE23_northamerica_usa_ca603 (DanaPlateauInyoNationalForest). geo_siteName: Metadata composited to: COMPOSITE: Dana Plateau Inyo National Forest + DanaPlateauInyoNationalForestpaleoData_proxy: Metadata identicalyearUnits: Metadata identical Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: pages2k_2922 (Dana Plateau Inyo National Forest) and FE23_northamerica_usa_ca603 (DanaPlateauInyoNationalForest). geo_siteName: Metadata composited to: COMPOSITE: Dana Plateau Inyo National Forest + DanaPlateauInyoNationalForestpaleoData_proxy: Metadata identicalyearUnits: Metadata identicalinterpretation_variable: Metadata identical -------------------------------------------------------------------------------- Metadata different for >>>interpretation_direction<<< in: pages2k_2922 (positive) and FE23_northamerica_usa_ca603 (N/A). Add the following note to duplicateDetails: Metadata differs for interpretation_direction in original records: pages2k_2922 (positive) and FE23_northamerica_usa_ca603 (N/A). interpretation_direction: Metadata composited to: COMPOSITE: positive + N/A -------------------------------------------------------------------------------- Metadata different for >>>interpretation_seasonality<<< in: pages2k_2922 (Annual) and FE23_northamerica_usa_ca603 (N/A). Add the following note to duplicateDetails: Metadata differs for interpretation_seasonality in original records: pages2k_2922 (Annual) and FE23_northamerica_usa_ca603 (N/A). interpretation_seasonality: Metadata composited to: COMPOSITE: Annual + N/A
saved figure in /figs//all_merged/dup_detection//composite_pages2k_2922_FE23_northamerica_usa_ca603.pdf pages2k_2959 FE23_northamerica_mexico_mexi043 Add the following note to duplicateDetails: archiveType: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identical -------------------------------------------------------------------------------- Metadata different for >>>geo_meanLat<<< in: pages2k_2959 (25.1) and FE23_northamerica_mexico_mexi043 (25.066668). Add the following note to duplicateDetails: Metadata differs for geo_meanLat in original records: pages2k_2959 (25.1) and FE23_northamerica_mexico_mexi043 (25.066668). geo_meanLat: Metadata averaged to: 25.083334 Add the following note to duplicateDetails: Metadata differs for geo_meanLat in original records: pages2k_2959 (25.1) and FE23_northamerica_mexico_mexi043 (25.066668). geo_meanLat: Metadata averaged to: 25.083334geo_meanLon: Metadata identical -------------------------------------------------------------------------------- Metadata different for >>>geo_siteName<<< in: pages2k_2959 (Cienega de Nuestra Senora de Guadalupe) and FE23_northamerica_mexico_mexi043 (CienegadeNuestraSenoradeGuadalupe). Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: pages2k_2959 (Cienega de Nuestra Senora de Guadalupe) and FE23_northamerica_mexico_mexi043 (CienegadeNuestraSenoradeGuadalupe). geo_siteName: Metadata composited to: COMPOSITE: Cienega de Nuestra Senora de Guadalupe + CienegadeNuestraSenoradeGuadalupe Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: pages2k_2959 (Cienega de Nuestra Senora de Guadalupe) and FE23_northamerica_mexico_mexi043 (CienegadeNuestraSenoradeGuadalupe). geo_siteName: Metadata composited to: COMPOSITE: Cienega de Nuestra Senora de Guadalupe + CienegadeNuestraSenoradeGuadalupepaleoData_proxy: Metadata identical Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: pages2k_2959 (Cienega de Nuestra Senora de Guadalupe) and FE23_northamerica_mexico_mexi043 (CienegadeNuestraSenoradeGuadalupe). geo_siteName: Metadata composited to: COMPOSITE: Cienega de Nuestra Senora de Guadalupe + CienegadeNuestraSenoradeGuadalupepaleoData_proxy: Metadata identicalyearUnits: Metadata identical -------------------------------------------------------------------------------- Metadata different for >>>interpretation_variable<<< in: pages2k_2959 (temperature) and FE23_northamerica_mexico_mexi043 (temperature+moisture). Add the following note to duplicateDetails: Metadata differs for interpretation_variable in original records: pages2k_2959 (temperature) and FE23_northamerica_mexico_mexi043 (temperature+moisture). interpretation_variable: Metadata composited to: COMPOSITE: temperature + temperature+moisture -------------------------------------------------------------------------------- Metadata different for >>>interpretation_direction<<< in: pages2k_2959 (positive) and FE23_northamerica_mexico_mexi043 (N/A). Add the following note to duplicateDetails: Metadata differs for interpretation_direction in original records: pages2k_2959 (positive) and FE23_northamerica_mexico_mexi043 (N/A). interpretation_direction: Metadata composited to: COMPOSITE: positive + N/A -------------------------------------------------------------------------------- Metadata different for >>>interpretation_seasonality<<< in: pages2k_2959 (Summer) and FE23_northamerica_mexico_mexi043 (N/A). Add the following note to duplicateDetails: Metadata differs for interpretation_seasonality in original records: pages2k_2959 (Summer) and FE23_northamerica_mexico_mexi043 (N/A). interpretation_seasonality: Metadata composited to: COMPOSITE: Summer + N/A
saved figure in /figs//all_merged/dup_detection//composite_pages2k_2959_FE23_northamerica_mexico_mexi043.pdf pages2k_2976 FE23_northamerica_usa_id008 Add the following note to duplicateDetails: archiveType: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identical -------------------------------------------------------------------------------- Metadata different for >>>geo_meanLat<<< in: pages2k_2976 (43.9) and FE23_northamerica_usa_id008 (43.866665). Add the following note to duplicateDetails: Metadata differs for geo_meanLat in original records: pages2k_2976 (43.9) and FE23_northamerica_usa_id008 (43.866665). geo_meanLat: Metadata averaged to: 43.88333 -------------------------------------------------------------------------------- Metadata different for >>>geo_meanLon<<< in: pages2k_2976 (-114.7) and FE23_northamerica_usa_id008 (-114.71667). Add the following note to duplicateDetails: Metadata differs for geo_meanLon in original records: pages2k_2976 (-114.7) and FE23_northamerica_usa_id008 (-114.71667). geo_meanLon: Metadata averaged to: -114.70833 -------------------------------------------------------------------------------- Metadata different for >>>geo_siteName<<< in: pages2k_2976 (Galena Pass Sawtooth National Forest) and FE23_northamerica_usa_id008 (GalenaPassSawtoothNationalForest). Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: pages2k_2976 (Galena Pass Sawtooth National Forest) and FE23_northamerica_usa_id008 (GalenaPassSawtoothNationalForest). geo_siteName: Metadata composited to: COMPOSITE: Galena Pass Sawtooth National Forest + GalenaPassSawtoothNationalForest Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: pages2k_2976 (Galena Pass Sawtooth National Forest) and FE23_northamerica_usa_id008 (GalenaPassSawtoothNationalForest). geo_siteName: Metadata composited to: COMPOSITE: Galena Pass Sawtooth National Forest + GalenaPassSawtoothNationalForestpaleoData_proxy: Metadata identical Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: pages2k_2976 (Galena Pass Sawtooth National Forest) and FE23_northamerica_usa_id008 (GalenaPassSawtoothNationalForest). geo_siteName: Metadata composited to: COMPOSITE: Galena Pass Sawtooth National Forest + GalenaPassSawtoothNationalForestpaleoData_proxy: Metadata identicalyearUnits: Metadata identical -------------------------------------------------------------------------------- Metadata different for >>>interpretation_variable<<< in: pages2k_2976 (None) and FE23_northamerica_usa_id008 (N/A). Add the following note to duplicateDetails: Metadata differs for interpretation_variable in original records: pages2k_2976 (None) and FE23_northamerica_usa_id008 (N/A). interpretation_variable: Metadata composited to: COMPOSITE: None + N/A -------------------------------------------------------------------------------- Metadata different for >>>interpretation_direction<<< in: pages2k_2976 (None) and FE23_northamerica_usa_id008 (N/A). Add the following note to duplicateDetails: Metadata differs for interpretation_direction in original records: pages2k_2976 (None) and FE23_northamerica_usa_id008 (N/A). interpretation_direction: Metadata composited to: COMPOSITE: None + N/A -------------------------------------------------------------------------------- Metadata different for >>>interpretation_seasonality<<< in: pages2k_2976 (None) and FE23_northamerica_usa_id008 (N/A). Add the following note to duplicateDetails: Metadata differs for interpretation_seasonality in original records: pages2k_2976 (None) and FE23_northamerica_usa_id008 (N/A). interpretation_seasonality: Metadata composited to: COMPOSITE: None + N/A
saved figure in /figs//all_merged/dup_detection//composite_pages2k_2976_FE23_northamerica_usa_id008.pdf pages2k_3002 FE23_northamerica_usa_or043 Add the following note to duplicateDetails: archiveType: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identical -------------------------------------------------------------------------------- Metadata different for >>>geo_meanLat<<< in: pages2k_3002 (45.3) and FE23_northamerica_usa_or043 (45.316666). Add the following note to duplicateDetails: Metadata differs for geo_meanLat in original records: pages2k_3002 (45.3) and FE23_northamerica_usa_or043 (45.316666). geo_meanLat: Metadata averaged to: 45.308334 -------------------------------------------------------------------------------- Metadata different for >>>geo_meanLon<<< in: pages2k_3002 (-121.7) and FE23_northamerica_usa_or043 (-121.65). Add the following note to duplicateDetails: Metadata differs for geo_meanLon in original records: pages2k_3002 (-121.7) and FE23_northamerica_usa_or043 (-121.65). geo_meanLon: Metadata averaged to: -121.675 -------------------------------------------------------------------------------- Metadata different for >>>geo_siteName<<< in: pages2k_3002 (Barlow Pass am Mt.Hood) and FE23_northamerica_usa_or043 (BarlowPassamMt.Hood). Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: pages2k_3002 (Barlow Pass am Mt.Hood) and FE23_northamerica_usa_or043 (BarlowPassamMt.Hood). geo_siteName: Metadata composited to: COMPOSITE: Barlow Pass am Mt.Hood + BarlowPassamMt.Hood Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: pages2k_3002 (Barlow Pass am Mt.Hood) and FE23_northamerica_usa_or043 (BarlowPassamMt.Hood). geo_siteName: Metadata composited to: COMPOSITE: Barlow Pass am Mt.Hood + BarlowPassamMt.HoodpaleoData_proxy: Metadata identical Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: pages2k_3002 (Barlow Pass am Mt.Hood) and FE23_northamerica_usa_or043 (BarlowPassamMt.Hood). geo_siteName: Metadata composited to: COMPOSITE: Barlow Pass am Mt.Hood + BarlowPassamMt.HoodpaleoData_proxy: Metadata identicalyearUnits: Metadata identical -------------------------------------------------------------------------------- Metadata different for >>>interpretation_variable<<< in: pages2k_3002 (None) and FE23_northamerica_usa_or043 (N/A). Add the following note to duplicateDetails: Metadata differs for interpretation_variable in original records: pages2k_3002 (None) and FE23_northamerica_usa_or043 (N/A). interpretation_variable: Metadata composited to: COMPOSITE: None + N/A -------------------------------------------------------------------------------- Metadata different for >>>interpretation_direction<<< in: pages2k_3002 (None) and FE23_northamerica_usa_or043 (N/A). Add the following note to duplicateDetails: Metadata differs for interpretation_direction in original records: pages2k_3002 (None) and FE23_northamerica_usa_or043 (N/A). interpretation_direction: Metadata composited to: COMPOSITE: None + N/A -------------------------------------------------------------------------------- Metadata different for >>>interpretation_seasonality<<< in: pages2k_3002 (None) and FE23_northamerica_usa_or043 (N/A). Add the following note to duplicateDetails: Metadata differs for interpretation_seasonality in original records: pages2k_3002 (None) and FE23_northamerica_usa_or043 (N/A). interpretation_seasonality: Metadata composited to: COMPOSITE: None + N/A
saved figure in /figs//all_merged/dup_detection//composite_pages2k_3002_FE23_northamerica_usa_or043.pdf pages2k_3038 FE23_northamerica_usa_mt108 Add the following note to duplicateDetails: archiveType: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identical -------------------------------------------------------------------------------- Metadata different for >>>geo_meanLat<<< in: pages2k_3038 (45.8) and FE23_northamerica_usa_mt108 (45.75). Add the following note to duplicateDetails: Metadata differs for geo_meanLat in original records: pages2k_3038 (45.8) and FE23_northamerica_usa_mt108 (45.75). geo_meanLat: Metadata averaged to: 45.775 -------------------------------------------------------------------------------- Metadata different for >>>geo_meanLon<<< in: pages2k_3038 (-112.5) and FE23_northamerica_usa_mt108 (-112.53333). Add the following note to duplicateDetails: Metadata differs for geo_meanLon in original records: pages2k_3038 (-112.5) and FE23_northamerica_usa_mt108 (-112.53333). geo_meanLon: Metadata averaged to: -112.51666 -------------------------------------------------------------------------------- Metadata different for >>>geo_siteName<<< in: pages2k_3038 (Highland Fire Outlook) and FE23_northamerica_usa_mt108 (HighlandFireOutlook). Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: pages2k_3038 (Highland Fire Outlook) and FE23_northamerica_usa_mt108 (HighlandFireOutlook). geo_siteName: Metadata composited to: COMPOSITE: Highland Fire Outlook + HighlandFireOutlook Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: pages2k_3038 (Highland Fire Outlook) and FE23_northamerica_usa_mt108 (HighlandFireOutlook). geo_siteName: Metadata composited to: COMPOSITE: Highland Fire Outlook + HighlandFireOutlookpaleoData_proxy: Metadata identical Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: pages2k_3038 (Highland Fire Outlook) and FE23_northamerica_usa_mt108 (HighlandFireOutlook). geo_siteName: Metadata composited to: COMPOSITE: Highland Fire Outlook + HighlandFireOutlookpaleoData_proxy: Metadata identicalyearUnits: Metadata identical -------------------------------------------------------------------------------- Metadata different for >>>interpretation_variable<<< in: pages2k_3038 (temperature) and FE23_northamerica_usa_mt108 (temperature+moisture). Add the following note to duplicateDetails: Metadata differs for interpretation_variable in original records: pages2k_3038 (temperature) and FE23_northamerica_usa_mt108 (temperature+moisture). interpretation_variable: Metadata composited to: COMPOSITE: temperature + temperature+moisture -------------------------------------------------------------------------------- Metadata different for >>>interpretation_direction<<< in: pages2k_3038 (positive) and FE23_northamerica_usa_mt108 (N/A). Add the following note to duplicateDetails: Metadata differs for interpretation_direction in original records: pages2k_3038 (positive) and FE23_northamerica_usa_mt108 (N/A). interpretation_direction: Metadata composited to: COMPOSITE: positive + N/A -------------------------------------------------------------------------------- Metadata different for >>>interpretation_seasonality<<< in: pages2k_3038 (Summer) and FE23_northamerica_usa_mt108 (N/A). Add the following note to duplicateDetails: Metadata differs for interpretation_seasonality in original records: pages2k_3038 (Summer) and FE23_northamerica_usa_mt108 (N/A). interpretation_seasonality: Metadata composited to: COMPOSITE: Summer + N/A
saved figure in /figs//all_merged/dup_detection//composite_pages2k_3038_FE23_northamerica_usa_mt108.pdf pages2k_3085 iso2k_1556 Add the following note to duplicateDetails: archiveType: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identical -------------------------------------------------------------------------------- Metadata different for >>>geo_siteName<<< in: pages2k_3085 (Ningaloo) and iso2k_1556 (Ningaloo Reef, West Australia). Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: pages2k_3085 (Ningaloo) and iso2k_1556 (Ningaloo Reef, West Australia). geo_siteName: Metadata composited to: COMPOSITE: Ningaloo + Ningaloo Reef, West Australia Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: pages2k_3085 (Ningaloo) and iso2k_1556 (Ningaloo Reef, West Australia). geo_siteName: Metadata composited to: COMPOSITE: Ningaloo + Ningaloo Reef, West AustraliapaleoData_proxy: Metadata identical Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: pages2k_3085 (Ningaloo) and iso2k_1556 (Ningaloo Reef, West Australia). geo_siteName: Metadata composited to: COMPOSITE: Ningaloo + Ningaloo Reef, West AustraliapaleoData_proxy: Metadata identicalyearUnits: Metadata identical -------------------------------------------------------------------------------- Metadata different for >>>interpretation_variable<<< in: pages2k_3085 (temperature) and iso2k_1556 (N/A). Add the following note to duplicateDetails: Metadata differs for interpretation_variable in original records: pages2k_3085 (temperature) and iso2k_1556 (N/A). interpretation_variable: Metadata composited to: COMPOSITE: temperature + N/A -------------------------------------------------------------------------------- Metadata different for >>>interpretation_direction<<< in: pages2k_3085 (negative) and iso2k_1556 (N/A). Add the following note to duplicateDetails: Metadata differs for interpretation_direction in original records: pages2k_3085 (negative) and iso2k_1556 (N/A). interpretation_direction: Metadata composited to: COMPOSITE: negative + N/A -------------------------------------------------------------------------------- Metadata different for >>>interpretation_seasonality<<< in: pages2k_3085 (subannual) and iso2k_1556 (N/A). Add the following note to duplicateDetails: Metadata differs for interpretation_seasonality in original records: pages2k_3085 (subannual) and iso2k_1556 (N/A). interpretation_seasonality: Metadata composited to: COMPOSITE: subannual + N/A
saved figure in /figs//all_merged/dup_detection//composite_pages2k_3085_iso2k_1556.pdf pages2k_3107 FE23_northamerica_usa_co552 Add the following note to duplicateDetails: archiveType: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identical -------------------------------------------------------------------------------- Metadata different for >>>geo_meanLon<<< in: pages2k_3107 (-107.7) and FE23_northamerica_usa_co552 (-107.71667). Add the following note to duplicateDetails: Metadata differs for geo_meanLon in original records: pages2k_3107 (-107.7) and FE23_northamerica_usa_co552 (-107.71667). geo_meanLon: Metadata averaged to: -107.70833 -------------------------------------------------------------------------------- Metadata different for >>>geo_siteName<<< in: pages2k_3107 (Red Mountain Pass Silverton) and FE23_northamerica_usa_co552 (RedMountainPassSilverton). Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: pages2k_3107 (Red Mountain Pass Silverton) and FE23_northamerica_usa_co552 (RedMountainPassSilverton). geo_siteName: Metadata composited to: COMPOSITE: Red Mountain Pass Silverton + RedMountainPassSilverton Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: pages2k_3107 (Red Mountain Pass Silverton) and FE23_northamerica_usa_co552 (RedMountainPassSilverton). geo_siteName: Metadata composited to: COMPOSITE: Red Mountain Pass Silverton + RedMountainPassSilvertonpaleoData_proxy: Metadata identical Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: pages2k_3107 (Red Mountain Pass Silverton) and FE23_northamerica_usa_co552 (RedMountainPassSilverton). geo_siteName: Metadata composited to: COMPOSITE: Red Mountain Pass Silverton + RedMountainPassSilvertonpaleoData_proxy: Metadata identicalyearUnits: Metadata identical -------------------------------------------------------------------------------- Metadata different for >>>interpretation_variable<<< in: pages2k_3107 (None) and FE23_northamerica_usa_co552 (N/A). Add the following note to duplicateDetails: Metadata differs for interpretation_variable in original records: pages2k_3107 (None) and FE23_northamerica_usa_co552 (N/A). interpretation_variable: Metadata composited to: COMPOSITE: None + N/A -------------------------------------------------------------------------------- Metadata different for >>>interpretation_direction<<< in: pages2k_3107 (None) and FE23_northamerica_usa_co552 (N/A). Add the following note to duplicateDetails: Metadata differs for interpretation_direction in original records: pages2k_3107 (None) and FE23_northamerica_usa_co552 (N/A). interpretation_direction: Metadata composited to: COMPOSITE: None + N/A -------------------------------------------------------------------------------- Metadata different for >>>interpretation_seasonality<<< in: pages2k_3107 (None) and FE23_northamerica_usa_co552 (N/A). Add the following note to duplicateDetails: Metadata differs for interpretation_seasonality in original records: pages2k_3107 (None) and FE23_northamerica_usa_co552 (N/A). interpretation_seasonality: Metadata composited to: COMPOSITE: None + N/A
saved figure in /figs//all_merged/dup_detection//composite_pages2k_3107_FE23_northamerica_usa_co552.pdf pages2k_3108 FE23_northamerica_usa_co552 Add the following note to duplicateDetails: archiveType: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identical -------------------------------------------------------------------------------- Metadata different for >>>geo_meanLon<<< in: pages2k_3108 (-107.7) and FE23_northamerica_usa_co552 (-107.71667). Add the following note to duplicateDetails: Metadata differs for geo_meanLon in original records: pages2k_3108 (-107.7) and FE23_northamerica_usa_co552 (-107.71667). geo_meanLon: Metadata averaged to: -107.70833 -------------------------------------------------------------------------------- Metadata different for >>>geo_siteName<<< in: pages2k_3108 (Red Mountain Pass Silverton) and FE23_northamerica_usa_co552 (RedMountainPassSilverton). Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: pages2k_3108 (Red Mountain Pass Silverton) and FE23_northamerica_usa_co552 (RedMountainPassSilverton). geo_siteName: Metadata composited to: COMPOSITE: Red Mountain Pass Silverton + RedMountainPassSilverton Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: pages2k_3108 (Red Mountain Pass Silverton) and FE23_northamerica_usa_co552 (RedMountainPassSilverton). geo_siteName: Metadata composited to: COMPOSITE: Red Mountain Pass Silverton + RedMountainPassSilvertonpaleoData_proxy: Metadata identical Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: pages2k_3108 (Red Mountain Pass Silverton) and FE23_northamerica_usa_co552 (RedMountainPassSilverton). geo_siteName: Metadata composited to: COMPOSITE: Red Mountain Pass Silverton + RedMountainPassSilvertonpaleoData_proxy: Metadata identicalyearUnits: Metadata identical -------------------------------------------------------------------------------- Metadata different for >>>interpretation_variable<<< in: pages2k_3108 (None) and FE23_northamerica_usa_co552 (N/A). Add the following note to duplicateDetails: Metadata differs for interpretation_variable in original records: pages2k_3108 (None) and FE23_northamerica_usa_co552 (N/A). interpretation_variable: Metadata composited to: COMPOSITE: None + N/A -------------------------------------------------------------------------------- Metadata different for >>>interpretation_direction<<< in: pages2k_3108 (None) and FE23_northamerica_usa_co552 (N/A). Add the following note to duplicateDetails: Metadata differs for interpretation_direction in original records: pages2k_3108 (None) and FE23_northamerica_usa_co552 (N/A). interpretation_direction: Metadata composited to: COMPOSITE: None + N/A -------------------------------------------------------------------------------- Metadata different for >>>interpretation_seasonality<<< in: pages2k_3108 (None) and FE23_northamerica_usa_co552 (N/A). Add the following note to duplicateDetails: Metadata differs for interpretation_seasonality in original records: pages2k_3108 (None) and FE23_northamerica_usa_co552 (N/A). interpretation_seasonality: Metadata composited to: COMPOSITE: None + N/A
saved figure in /figs//all_merged/dup_detection//composite_pages2k_3108_FE23_northamerica_usa_co552.pdf pages2k_3179 FE23_northamerica_usa_ak057 Add the following note to duplicateDetails: archiveType: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identical -------------------------------------------------------------------------------- Metadata different for >>>geo_meanLat<<< in: pages2k_3179 (65.2) and FE23_northamerica_usa_ak057 (65.183334). Add the following note to duplicateDetails: Metadata differs for geo_meanLat in original records: pages2k_3179 (65.2) and FE23_northamerica_usa_ak057 (65.183334). geo_meanLat: Metadata averaged to: 65.191666 Add the following note to duplicateDetails: Metadata differs for geo_meanLat in original records: pages2k_3179 (65.2) and FE23_northamerica_usa_ak057 (65.183334). geo_meanLat: Metadata averaged to: 65.191666geo_meanLon: Metadata identical -------------------------------------------------------------------------------- Metadata different for >>>geo_siteName<<< in: pages2k_3179 (Almond Butter Lower) and FE23_northamerica_usa_ak057 (AlmondButterLower). Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: pages2k_3179 (Almond Butter Lower) and FE23_northamerica_usa_ak057 (AlmondButterLower). geo_siteName: Metadata composited to: COMPOSITE: Almond Butter Lower + AlmondButterLower Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: pages2k_3179 (Almond Butter Lower) and FE23_northamerica_usa_ak057 (AlmondButterLower). geo_siteName: Metadata composited to: COMPOSITE: Almond Butter Lower + AlmondButterLowerpaleoData_proxy: Metadata identical Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: pages2k_3179 (Almond Butter Lower) and FE23_northamerica_usa_ak057 (AlmondButterLower). geo_siteName: Metadata composited to: COMPOSITE: Almond Butter Lower + AlmondButterLowerpaleoData_proxy: Metadata identicalyearUnits: Metadata identical -------------------------------------------------------------------------------- Metadata different for >>>interpretation_variable<<< in: pages2k_3179 (temperature) and FE23_northamerica_usa_ak057 (NOT temperature NOT moisture). Add the following note to duplicateDetails: Metadata differs for interpretation_variable in original records: pages2k_3179 (temperature) and FE23_northamerica_usa_ak057 (NOT temperature NOT moisture). interpretation_variable: Metadata composited to: COMPOSITE: temperature + NOT temperature NOT moisture -------------------------------------------------------------------------------- Metadata different for >>>interpretation_direction<<< in: pages2k_3179 (positive) and FE23_northamerica_usa_ak057 (N/A). Add the following note to duplicateDetails: Metadata differs for interpretation_direction in original records: pages2k_3179 (positive) and FE23_northamerica_usa_ak057 (N/A). interpretation_direction: Metadata composited to: COMPOSITE: positive + N/A -------------------------------------------------------------------------------- Metadata different for >>>interpretation_seasonality<<< in: pages2k_3179 (Summer) and FE23_northamerica_usa_ak057 (N/A). Add the following note to duplicateDetails: Metadata differs for interpretation_seasonality in original records: pages2k_3179 (Summer) and FE23_northamerica_usa_ak057 (N/A). interpretation_seasonality: Metadata composited to: COMPOSITE: Summer + N/A
saved figure in /figs//all_merged/dup_detection//composite_pages2k_3179_FE23_northamerica_usa_ak057.pdf pages2k_3196 FE23_asia_mong011 Add the following note to duplicateDetails: archiveType: Metadata identical -------------------------------------------------------------------------------- Metadata different for >>>geo_meanElev<<< in: pages2k_3196 (3700.0) and FE23_asia_mong011 (1900.0). Add the following note to duplicateDetails: Metadata differs for geo_meanElev in original records: pages2k_3196 (3700.0) and FE23_asia_mong011 (1900.0). geo_meanElev: Metadata averaged to: 2800.0 -------------------------------------------------------------------------------- Metadata different for >>>geo_meanLat<<< in: pages2k_3196 (48.13) and FE23_asia_mong011 (48.15). Add the following note to duplicateDetails: Metadata differs for geo_meanLat in original records: pages2k_3196 (48.13) and FE23_asia_mong011 (48.15). geo_meanLat: Metadata averaged to: 48.14 -------------------------------------------------------------------------------- Metadata different for >>>geo_meanLon<<< in: pages2k_3196 (100.27) and FE23_asia_mong011 (100.28333). Add the following note to duplicateDetails: Metadata differs for geo_meanLon in original records: pages2k_3196 (100.27) and FE23_asia_mong011 (100.28333). geo_meanLon: Metadata averaged to: 100.276665 -------------------------------------------------------------------------------- Metadata different for >>>geo_siteName<<< in: pages2k_3196 (MONG011) and FE23_asia_mong011 (ZuunSalaaMod). Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: pages2k_3196 (MONG011) and FE23_asia_mong011 (ZuunSalaaMod). geo_siteName: Metadata composited to: COMPOSITE: MONG011 + ZuunSalaaMod Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: pages2k_3196 (MONG011) and FE23_asia_mong011 (ZuunSalaaMod). geo_siteName: Metadata composited to: COMPOSITE: MONG011 + ZuunSalaaModpaleoData_proxy: Metadata identical Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: pages2k_3196 (MONG011) and FE23_asia_mong011 (ZuunSalaaMod). geo_siteName: Metadata composited to: COMPOSITE: MONG011 + ZuunSalaaModpaleoData_proxy: Metadata identicalyearUnits: Metadata identical Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: pages2k_3196 (MONG011) and FE23_asia_mong011 (ZuunSalaaMod). geo_siteName: Metadata composited to: COMPOSITE: MONG011 + ZuunSalaaModpaleoData_proxy: Metadata identicalyearUnits: Metadata identicalinterpretation_variable: Metadata identical -------------------------------------------------------------------------------- Metadata different for >>>interpretation_direction<<< in: pages2k_3196 (positive) and FE23_asia_mong011 (N/A). Add the following note to duplicateDetails: Metadata differs for interpretation_direction in original records: pages2k_3196 (positive) and FE23_asia_mong011 (N/A). interpretation_direction: Metadata composited to: COMPOSITE: positive + N/A -------------------------------------------------------------------------------- Metadata different for >>>interpretation_seasonality<<< in: pages2k_3196 (Annual) and FE23_asia_mong011 (N/A). Add the following note to duplicateDetails: Metadata differs for interpretation_seasonality in original records: pages2k_3196 (Annual) and FE23_asia_mong011 (N/A). interpretation_seasonality: Metadata composited to: COMPOSITE: Annual + N/A
saved figure in /figs//all_merged/dup_detection//composite_pages2k_3196_FE23_asia_mong011.pdf pages2k_3313 FE23_northamerica_usa_ca560 Add the following note to duplicateDetails: archiveType: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identical -------------------------------------------------------------------------------- Metadata different for >>>geo_meanLon<<< in: pages2k_3313 (-119.3) and FE23_northamerica_usa_ca560 (-119.25). Add the following note to duplicateDetails: Metadata differs for geo_meanLon in original records: pages2k_3313 (-119.3) and FE23_northamerica_usa_ca560 (-119.25). geo_meanLon: Metadata averaged to: -119.275 -------------------------------------------------------------------------------- Metadata different for >>>geo_siteName<<< in: pages2k_3313 (Yosemite Park E Eingang) and FE23_northamerica_usa_ca560 (YosemiteParkEEingang). Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: pages2k_3313 (Yosemite Park E Eingang) and FE23_northamerica_usa_ca560 (YosemiteParkEEingang). geo_siteName: Metadata composited to: COMPOSITE: Yosemite Park E Eingang + YosemiteParkEEingang Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: pages2k_3313 (Yosemite Park E Eingang) and FE23_northamerica_usa_ca560 (YosemiteParkEEingang). geo_siteName: Metadata composited to: COMPOSITE: Yosemite Park E Eingang + YosemiteParkEEingangpaleoData_proxy: Metadata identical Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: pages2k_3313 (Yosemite Park E Eingang) and FE23_northamerica_usa_ca560 (YosemiteParkEEingang). geo_siteName: Metadata composited to: COMPOSITE: Yosemite Park E Eingang + YosemiteParkEEingangpaleoData_proxy: Metadata identicalyearUnits: Metadata identical -------------------------------------------------------------------------------- Metadata different for >>>interpretation_variable<<< in: pages2k_3313 (None) and FE23_northamerica_usa_ca560 (moisture). Add the following note to duplicateDetails: Metadata differs for interpretation_variable in original records: pages2k_3313 (None) and FE23_northamerica_usa_ca560 (moisture). interpretation_variable: Metadata composited to: COMPOSITE: None + moisture -------------------------------------------------------------------------------- Metadata different for >>>interpretation_direction<<< in: pages2k_3313 (None) and FE23_northamerica_usa_ca560 (N/A). Add the following note to duplicateDetails: Metadata differs for interpretation_direction in original records: pages2k_3313 (None) and FE23_northamerica_usa_ca560 (N/A). interpretation_direction: Metadata composited to: COMPOSITE: None + N/A -------------------------------------------------------------------------------- Metadata different for >>>interpretation_seasonality<<< in: pages2k_3313 (None) and FE23_northamerica_usa_ca560 (N/A). Add the following note to duplicateDetails: Metadata differs for interpretation_seasonality in original records: pages2k_3313 (None) and FE23_northamerica_usa_ca560 (N/A). interpretation_seasonality: Metadata composited to: COMPOSITE: None + N/A
saved figure in /figs//all_merged/dup_detection//composite_pages2k_3313_FE23_northamerica_usa_ca560.pdf pages2k_3404 FE23_northamerica_canada_cana029 Add the following note to duplicateDetails: archiveType: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identical -------------------------------------------------------------------------------- Metadata different for >>>geo_meanLat<<< in: pages2k_3404 (68.6) and FE23_northamerica_canada_cana029 (68.63333). Add the following note to duplicateDetails: Metadata differs for geo_meanLat in original records: pages2k_3404 (68.6) and FE23_northamerica_canada_cana029 (68.63333). geo_meanLat: Metadata averaged to: 68.61667 -------------------------------------------------------------------------------- Metadata different for >>>geo_meanLon<<< in: pages2k_3404 (-138.6) and FE23_northamerica_canada_cana029 (-138.63333). Add the following note to duplicateDetails: Metadata differs for geo_meanLon in original records: pages2k_3404 (-138.6) and FE23_northamerica_canada_cana029 (-138.63333). geo_meanLon: Metadata averaged to: -138.61667 -------------------------------------------------------------------------------- Metadata different for >>>geo_siteName<<< in: pages2k_3404 (Spruce Creek) and FE23_northamerica_canada_cana029 (SpruceCreek). Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: pages2k_3404 (Spruce Creek) and FE23_northamerica_canada_cana029 (SpruceCreek). geo_siteName: Metadata composited to: COMPOSITE: Spruce Creek + SpruceCreek Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: pages2k_3404 (Spruce Creek) and FE23_northamerica_canada_cana029 (SpruceCreek). geo_siteName: Metadata composited to: COMPOSITE: Spruce Creek + SpruceCreekpaleoData_proxy: Metadata identical Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: pages2k_3404 (Spruce Creek) and FE23_northamerica_canada_cana029 (SpruceCreek). geo_siteName: Metadata composited to: COMPOSITE: Spruce Creek + SpruceCreekpaleoData_proxy: Metadata identicalyearUnits: Metadata identical -------------------------------------------------------------------------------- Metadata different for >>>interpretation_variable<<< in: pages2k_3404 (temperature) and FE23_northamerica_canada_cana029 (NOT temperature NOT moisture). Add the following note to duplicateDetails: Metadata differs for interpretation_variable in original records: pages2k_3404 (temperature) and FE23_northamerica_canada_cana029 (NOT temperature NOT moisture). interpretation_variable: Metadata composited to: COMPOSITE: temperature + NOT temperature NOT moisture -------------------------------------------------------------------------------- Metadata different for >>>interpretation_direction<<< in: pages2k_3404 (positive) and FE23_northamerica_canada_cana029 (N/A). Add the following note to duplicateDetails: Metadata differs for interpretation_direction in original records: pages2k_3404 (positive) and FE23_northamerica_canada_cana029 (N/A). interpretation_direction: Metadata composited to: COMPOSITE: positive + N/A -------------------------------------------------------------------------------- Metadata different for >>>interpretation_seasonality<<< in: pages2k_3404 (Summer) and FE23_northamerica_canada_cana029 (N/A). Add the following note to duplicateDetails: Metadata differs for interpretation_seasonality in original records: pages2k_3404 (Summer) and FE23_northamerica_canada_cana029 (N/A). interpretation_seasonality: Metadata composited to: COMPOSITE: Summer + N/A
saved figure in /figs//all_merged/dup_detection//composite_pages2k_3404_FE23_northamerica_canada_cana029.pdf pages2k_3503 FE23_northamerica_usa_wa072 Add the following note to duplicateDetails: archiveType: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identical -------------------------------------------------------------------------------- Metadata different for >>>geo_meanLat<<< in: pages2k_3503 (48.7) and FE23_northamerica_usa_wa072 (48.733334). Add the following note to duplicateDetails: Metadata differs for geo_meanLat in original records: pages2k_3503 (48.7) and FE23_northamerica_usa_wa072 (48.733334). geo_meanLat: Metadata averaged to: 48.716667 -------------------------------------------------------------------------------- Metadata different for >>>geo_meanLon<<< in: pages2k_3503 (-120.7) and FE23_northamerica_usa_wa072 (-120.65). Add the following note to duplicateDetails: Metadata differs for geo_meanLon in original records: pages2k_3503 (-120.7) and FE23_northamerica_usa_wa072 (-120.65). geo_meanLon: Metadata averaged to: -120.675 -------------------------------------------------------------------------------- Metadata different for >>>geo_siteName<<< in: pages2k_3503 (Harts Pass N2) and FE23_northamerica_usa_wa072 (Hart'sPassN2). Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: pages2k_3503 (Harts Pass N2) and FE23_northamerica_usa_wa072 (Hart'sPassN2). geo_siteName: Metadata composited to: COMPOSITE: Harts Pass N2 + Hart'sPassN2 Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: pages2k_3503 (Harts Pass N2) and FE23_northamerica_usa_wa072 (Hart'sPassN2). geo_siteName: Metadata composited to: COMPOSITE: Harts Pass N2 + Hart'sPassN2paleoData_proxy: Metadata identical Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: pages2k_3503 (Harts Pass N2) and FE23_northamerica_usa_wa072 (Hart'sPassN2). geo_siteName: Metadata composited to: COMPOSITE: Harts Pass N2 + Hart'sPassN2paleoData_proxy: Metadata identicalyearUnits: Metadata identical Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: pages2k_3503 (Harts Pass N2) and FE23_northamerica_usa_wa072 (Hart'sPassN2). geo_siteName: Metadata composited to: COMPOSITE: Harts Pass N2 + Hart'sPassN2paleoData_proxy: Metadata identicalyearUnits: Metadata identicalinterpretation_variable: Metadata identical -------------------------------------------------------------------------------- Metadata different for >>>interpretation_direction<<< in: pages2k_3503 (positive) and FE23_northamerica_usa_wa072 (N/A). Add the following note to duplicateDetails: Metadata differs for interpretation_direction in original records: pages2k_3503 (positive) and FE23_northamerica_usa_wa072 (N/A). interpretation_direction: Metadata composited to: COMPOSITE: positive + N/A -------------------------------------------------------------------------------- Metadata different for >>>interpretation_seasonality<<< in: pages2k_3503 (Summer) and FE23_northamerica_usa_wa072 (N/A). Add the following note to duplicateDetails: Metadata differs for interpretation_seasonality in original records: pages2k_3503 (Summer) and FE23_northamerica_usa_wa072 (N/A). interpretation_seasonality: Metadata composited to: COMPOSITE: Summer + N/A
saved figure in /figs//all_merged/dup_detection//composite_pages2k_3503_FE23_northamerica_usa_wa072.pdf pages2k_3524 FE23_northamerica_usa_ak010 Add the following note to duplicateDetails: archiveType: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identical -------------------------------------------------------------------------------- Metadata different for >>>geo_meanLat<<< in: pages2k_3524 (61.8) and FE23_northamerica_usa_ak010 (61.833332). Add the following note to duplicateDetails: Metadata differs for geo_meanLat in original records: pages2k_3524 (61.8) and FE23_northamerica_usa_ak010 (61.833332). geo_meanLat: Metadata averaged to: 61.816666 -------------------------------------------------------------------------------- Metadata different for >>>geo_meanLon<<< in: pages2k_3524 (-147.3) and FE23_northamerica_usa_ak010 (-147.33333). Add the following note to duplicateDetails: Metadata differs for geo_meanLon in original records: pages2k_3524 (-147.3) and FE23_northamerica_usa_ak010 (-147.33333). geo_meanLon: Metadata averaged to: -147.31667 -------------------------------------------------------------------------------- Metadata different for >>>geo_siteName<<< in: pages2k_3524 (Eureka Summit) and FE23_northamerica_usa_ak010 (EurekaSummit). Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: pages2k_3524 (Eureka Summit) and FE23_northamerica_usa_ak010 (EurekaSummit). geo_siteName: Metadata composited to: COMPOSITE: Eureka Summit + EurekaSummit Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: pages2k_3524 (Eureka Summit) and FE23_northamerica_usa_ak010 (EurekaSummit). geo_siteName: Metadata composited to: COMPOSITE: Eureka Summit + EurekaSummitpaleoData_proxy: Metadata identical Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: pages2k_3524 (Eureka Summit) and FE23_northamerica_usa_ak010 (EurekaSummit). geo_siteName: Metadata composited to: COMPOSITE: Eureka Summit + EurekaSummitpaleoData_proxy: Metadata identicalyearUnits: Metadata identical -------------------------------------------------------------------------------- Metadata different for >>>interpretation_variable<<< in: pages2k_3524 (temperature) and FE23_northamerica_usa_ak010 (NOT temperature NOT moisture). Add the following note to duplicateDetails: Metadata differs for interpretation_variable in original records: pages2k_3524 (temperature) and FE23_northamerica_usa_ak010 (NOT temperature NOT moisture). interpretation_variable: Metadata composited to: COMPOSITE: temperature + NOT temperature NOT moisture -------------------------------------------------------------------------------- Metadata different for >>>interpretation_direction<<< in: pages2k_3524 (positive) and FE23_northamerica_usa_ak010 (N/A). Add the following note to duplicateDetails: Metadata differs for interpretation_direction in original records: pages2k_3524 (positive) and FE23_northamerica_usa_ak010 (N/A). interpretation_direction: Metadata composited to: COMPOSITE: positive + N/A -------------------------------------------------------------------------------- Metadata different for >>>interpretation_seasonality<<< in: pages2k_3524 (Summer) and FE23_northamerica_usa_ak010 (N/A). Add the following note to duplicateDetails: Metadata differs for interpretation_seasonality in original records: pages2k_3524 (Summer) and FE23_northamerica_usa_ak010 (N/A). interpretation_seasonality: Metadata composited to: COMPOSITE: Summer + N/A
saved figure in /figs//all_merged/dup_detection//composite_pages2k_3524_FE23_northamerica_usa_ak010.pdf pages2k_3550 FE23_asia_russ137w Add the following note to duplicateDetails: archiveType: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identical -------------------------------------------------------------------------------- Metadata different for >>>geo_meanLat<<< in: pages2k_3550 (50.48) and FE23_asia_russ137w (50.483334). Add the following note to duplicateDetails: Metadata differs for geo_meanLat in original records: pages2k_3550 (50.48) and FE23_asia_russ137w (50.483334). geo_meanLat: Metadata averaged to: 50.481667 Add the following note to duplicateDetails: Metadata differs for geo_meanLat in original records: pages2k_3550 (50.48) and FE23_asia_russ137w (50.483334). geo_meanLat: Metadata averaged to: 50.481667geo_meanLon: Metadata identical -------------------------------------------------------------------------------- Metadata different for >>>geo_siteName<<< in: pages2k_3550 (Altai Mt., Ust Ulagan Lake) and FE23_asia_russ137w (UstUlaganLake(Altai)). Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: pages2k_3550 (Altai Mt., Ust Ulagan Lake) and FE23_asia_russ137w (UstUlaganLake(Altai)). geo_siteName: Metadata composited to: COMPOSITE: Altai Mt., Ust Ulagan Lake + UstUlaganLake(Altai) Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: pages2k_3550 (Altai Mt., Ust Ulagan Lake) and FE23_asia_russ137w (UstUlaganLake(Altai)). geo_siteName: Metadata composited to: COMPOSITE: Altai Mt., Ust Ulagan Lake + UstUlaganLake(Altai)paleoData_proxy: Metadata identical Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: pages2k_3550 (Altai Mt., Ust Ulagan Lake) and FE23_asia_russ137w (UstUlaganLake(Altai)). geo_siteName: Metadata composited to: COMPOSITE: Altai Mt., Ust Ulagan Lake + UstUlaganLake(Altai)paleoData_proxy: Metadata identicalyearUnits: Metadata identical -------------------------------------------------------------------------------- Metadata different for >>>interpretation_variable<<< in: pages2k_3550 (temperature) and FE23_asia_russ137w (moisture). Add the following note to duplicateDetails: Metadata differs for interpretation_variable in original records: pages2k_3550 (temperature) and FE23_asia_russ137w (moisture). interpretation_variable: Metadata composited to: COMPOSITE: temperature + moisture -------------------------------------------------------------------------------- Metadata different for >>>interpretation_direction<<< in: pages2k_3550 (positive) and FE23_asia_russ137w (N/A). Add the following note to duplicateDetails: Metadata differs for interpretation_direction in original records: pages2k_3550 (positive) and FE23_asia_russ137w (N/A). interpretation_direction: Metadata composited to: COMPOSITE: positive + N/A -------------------------------------------------------------------------------- Metadata different for >>>interpretation_seasonality<<< in: pages2k_3550 (Summer) and FE23_asia_russ137w (N/A). Add the following note to duplicateDetails: Metadata differs for interpretation_seasonality in original records: pages2k_3550 (Summer) and FE23_asia_russ137w (N/A). interpretation_seasonality: Metadata composited to: COMPOSITE: Summer + N/A
saved figure in /figs//all_merged/dup_detection//composite_pages2k_3550_FE23_asia_russ137w.pdf pages2k_3583 FE23_northamerica_usa_co633 Add the following note to duplicateDetails: archiveType: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identical -------------------------------------------------------------------------------- Metadata different for >>>geo_meanLat<<< in: pages2k_3583 (40.6) and FE23_northamerica_usa_co633 (40.55). Add the following note to duplicateDetails: Metadata differs for geo_meanLat in original records: pages2k_3583 (40.6) and FE23_northamerica_usa_co633 (40.55). geo_meanLat: Metadata averaged to: 40.574997 -------------------------------------------------------------------------------- Metadata different for >>>geo_meanLon<<< in: pages2k_3583 (-105.8) and FE23_northamerica_usa_co633 (-105.833336). Add the following note to duplicateDetails: Metadata differs for geo_meanLon in original records: pages2k_3583 (-105.8) and FE23_northamerica_usa_co633 (-105.833336). geo_meanLon: Metadata averaged to: -105.816666 -------------------------------------------------------------------------------- Metadata different for >>>geo_siteName<<< in: pages2k_3583 (Cameron Pass) and FE23_northamerica_usa_co633 (CameronPass). Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: pages2k_3583 (Cameron Pass) and FE23_northamerica_usa_co633 (CameronPass). geo_siteName: Metadata composited to: COMPOSITE: Cameron Pass + CameronPass Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: pages2k_3583 (Cameron Pass) and FE23_northamerica_usa_co633 (CameronPass). geo_siteName: Metadata composited to: COMPOSITE: Cameron Pass + CameronPasspaleoData_proxy: Metadata identical Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: pages2k_3583 (Cameron Pass) and FE23_northamerica_usa_co633 (CameronPass). geo_siteName: Metadata composited to: COMPOSITE: Cameron Pass + CameronPasspaleoData_proxy: Metadata identicalyearUnits: Metadata identical -------------------------------------------------------------------------------- Metadata different for >>>interpretation_variable<<< in: pages2k_3583 (temperature) and FE23_northamerica_usa_co633 (N/A). Add the following note to duplicateDetails: Metadata differs for interpretation_variable in original records: pages2k_3583 (temperature) and FE23_northamerica_usa_co633 (N/A). interpretation_variable: Metadata composited to: COMPOSITE: temperature + N/A -------------------------------------------------------------------------------- Metadata different for >>>interpretation_direction<<< in: pages2k_3583 (positive) and FE23_northamerica_usa_co633 (N/A). Add the following note to duplicateDetails: Metadata differs for interpretation_direction in original records: pages2k_3583 (positive) and FE23_northamerica_usa_co633 (N/A). interpretation_direction: Metadata composited to: COMPOSITE: positive + N/A -------------------------------------------------------------------------------- Metadata different for >>>interpretation_seasonality<<< in: pages2k_3583 (Summer) and FE23_northamerica_usa_co633 (N/A). Add the following note to duplicateDetails: Metadata differs for interpretation_seasonality in original records: pages2k_3583 (Summer) and FE23_northamerica_usa_co633 (N/A). interpretation_seasonality: Metadata composited to: COMPOSITE: Summer + N/A
saved figure in /figs//all_merged/dup_detection//composite_pages2k_3583_FE23_northamerica_usa_co633.pdf pages2k_3642 FE23_northamerica_usa_wy025 Add the following note to duplicateDetails: archiveType: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identical -------------------------------------------------------------------------------- Metadata different for >>>geo_meanLat<<< in: pages2k_3642 (43.7) and FE23_northamerica_usa_wy025 (43.716667). Add the following note to duplicateDetails: Metadata differs for geo_meanLat in original records: pages2k_3642 (43.7) and FE23_northamerica_usa_wy025 (43.716667). geo_meanLat: Metadata averaged to: 43.708336 -------------------------------------------------------------------------------- Metadata different for >>>geo_meanLon<<< in: pages2k_3642 (-110.1) and FE23_northamerica_usa_wy025 (-110.05). Add the following note to duplicateDetails: Metadata differs for geo_meanLon in original records: pages2k_3642 (-110.1) and FE23_northamerica_usa_wy025 (-110.05). geo_meanLon: Metadata averaged to: -110.075 -------------------------------------------------------------------------------- Metadata different for >>>geo_siteName<<< in: pages2k_3642 (Togwatee Pass) and FE23_northamerica_usa_wy025 (TogwateePass). Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: pages2k_3642 (Togwatee Pass) and FE23_northamerica_usa_wy025 (TogwateePass). geo_siteName: Metadata composited to: COMPOSITE: Togwatee Pass + TogwateePass Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: pages2k_3642 (Togwatee Pass) and FE23_northamerica_usa_wy025 (TogwateePass). geo_siteName: Metadata composited to: COMPOSITE: Togwatee Pass + TogwateePasspaleoData_proxy: Metadata identical Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: pages2k_3642 (Togwatee Pass) and FE23_northamerica_usa_wy025 (TogwateePass). geo_siteName: Metadata composited to: COMPOSITE: Togwatee Pass + TogwateePasspaleoData_proxy: Metadata identicalyearUnits: Metadata identical -------------------------------------------------------------------------------- Metadata different for >>>interpretation_variable<<< in: pages2k_3642 (temperature) and FE23_northamerica_usa_wy025 (N/A). Add the following note to duplicateDetails: Metadata differs for interpretation_variable in original records: pages2k_3642 (temperature) and FE23_northamerica_usa_wy025 (N/A). interpretation_variable: Metadata composited to: COMPOSITE: temperature + N/A -------------------------------------------------------------------------------- Metadata different for >>>interpretation_direction<<< in: pages2k_3642 (positive) and FE23_northamerica_usa_wy025 (N/A). Add the following note to duplicateDetails: Metadata differs for interpretation_direction in original records: pages2k_3642 (positive) and FE23_northamerica_usa_wy025 (N/A). interpretation_direction: Metadata composited to: COMPOSITE: positive + N/A -------------------------------------------------------------------------------- Metadata different for >>>interpretation_seasonality<<< in: pages2k_3642 (Summer) and FE23_northamerica_usa_wy025 (N/A). Add the following note to duplicateDetails: Metadata differs for interpretation_seasonality in original records: pages2k_3642 (Summer) and FE23_northamerica_usa_wy025 (N/A). interpretation_seasonality: Metadata composited to: COMPOSITE: Summer + N/A
saved figure in /figs//all_merged/dup_detection//composite_pages2k_3642_FE23_northamerica_usa_wy025.pdf FE23_europe_swed019w FE23_europe_swed021w Add the following note to duplicateDetails: archiveType: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identical -------------------------------------------------------------------------------- Metadata different for >>>geo_siteName<<< in: FE23_europe_swed019w (Torneträskr+f.,Bartoli) and FE23_europe_swed021w (Torneträskfos.,Bartoli). Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: FE23_europe_swed019w (Torneträskr+f.,Bartoli) and FE23_europe_swed021w (Torneträskfos.,Bartoli). geo_siteName: Metadata composited to: COMPOSITE: Torneträskr+f.,Bartoli + Torneträskfos.,Bartoli Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: FE23_europe_swed019w (Torneträskr+f.,Bartoli) and FE23_europe_swed021w (Torneträskfos.,Bartoli). geo_siteName: Metadata composited to: COMPOSITE: Torneträskr+f.,Bartoli + Torneträskfos.,BartolipaleoData_proxy: Metadata identical Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: FE23_europe_swed019w (Torneträskr+f.,Bartoli) and FE23_europe_swed021w (Torneträskfos.,Bartoli). geo_siteName: Metadata composited to: COMPOSITE: Torneträskr+f.,Bartoli + Torneträskfos.,BartolipaleoData_proxy: Metadata identicalyearUnits: Metadata identical -------------------------------------------------------------------------------- Metadata different for >>>interpretation_variable<<< in: FE23_europe_swed019w (NOT temperature NOT moisture) and FE23_europe_swed021w (N/A). Add the following note to duplicateDetails: Metadata differs for interpretation_variable in original records: FE23_europe_swed019w (NOT temperature NOT moisture) and FE23_europe_swed021w (N/A). interpretation_variable: Metadata composited to: COMPOSITE: NOT temperature NOT moisture + N/A Add the following note to duplicateDetails: Metadata differs for interpretation_variable in original records: FE23_europe_swed019w (NOT temperature NOT moisture) and FE23_europe_swed021w (N/A). interpretation_variable: Metadata composited to: COMPOSITE: NOT temperature NOT moisture + N/Ainterpretation_direction: Metadata identical Add the following note to duplicateDetails: Metadata differs for interpretation_variable in original records: FE23_europe_swed019w (NOT temperature NOT moisture) and FE23_europe_swed021w (N/A). interpretation_variable: Metadata composited to: COMPOSITE: NOT temperature NOT moisture + N/Ainterpretation_direction: Metadata identicalinterpretation_seasonality: Metadata identical
saved figure in /figs//all_merged/dup_detection//composite_FE23_europe_swed019w_FE23_europe_swed021w.pdf FE23_northamerica_mexico_mexi022 FE23_northamerica_mexico_mexi023 Add the following note to duplicateDetails: archiveType: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identical -------------------------------------------------------------------------------- Metadata different for >>>geo_siteName<<< in: FE23_northamerica_mexico_mexi022 (CerroBaraja) and FE23_northamerica_mexico_mexi023 (CerroBarajaandLosAngelesSawmill). Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: FE23_northamerica_mexico_mexi022 (CerroBaraja) and FE23_northamerica_mexico_mexi023 (CerroBarajaandLosAngelesSawmill). geo_siteName: Metadata composited to: COMPOSITE: CerroBaraja + CerroBarajaandLosAngelesSawmill Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: FE23_northamerica_mexico_mexi022 (CerroBaraja) and FE23_northamerica_mexico_mexi023 (CerroBarajaandLosAngelesSawmill). geo_siteName: Metadata composited to: COMPOSITE: CerroBaraja + CerroBarajaandLosAngelesSawmillpaleoData_proxy: Metadata identical Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: FE23_northamerica_mexico_mexi022 (CerroBaraja) and FE23_northamerica_mexico_mexi023 (CerroBarajaandLosAngelesSawmill). geo_siteName: Metadata composited to: COMPOSITE: CerroBaraja + CerroBarajaandLosAngelesSawmillpaleoData_proxy: Metadata identicalyearUnits: Metadata identical Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: FE23_northamerica_mexico_mexi022 (CerroBaraja) and FE23_northamerica_mexico_mexi023 (CerroBarajaandLosAngelesSawmill). geo_siteName: Metadata composited to: COMPOSITE: CerroBaraja + CerroBarajaandLosAngelesSawmillpaleoData_proxy: Metadata identicalyearUnits: Metadata identicalinterpretation_variable: Metadata identical Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: FE23_northamerica_mexico_mexi022 (CerroBaraja) and FE23_northamerica_mexico_mexi023 (CerroBarajaandLosAngelesSawmill). geo_siteName: Metadata composited to: COMPOSITE: CerroBaraja + CerroBarajaandLosAngelesSawmillpaleoData_proxy: Metadata identicalyearUnits: Metadata identicalinterpretation_variable: Metadata identicalinterpretation_direction: Metadata identical Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: FE23_northamerica_mexico_mexi022 (CerroBaraja) and FE23_northamerica_mexico_mexi023 (CerroBarajaandLosAngelesSawmill). geo_siteName: Metadata composited to: COMPOSITE: CerroBaraja + CerroBarajaandLosAngelesSawmillpaleoData_proxy: Metadata identicalyearUnits: Metadata identicalinterpretation_variable: Metadata identicalinterpretation_direction: Metadata identicalinterpretation_seasonality: Metadata identical
saved figure in /figs//all_merged/dup_detection//composite_FE23_northamerica_mexico_mexi022_FE23_northamerica_mexico_mexi023.pdf FE23_australia_newz008 FE23_australia_newz092 Add the following note to duplicateDetails: archiveType: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identicalgeo_siteName: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identicalgeo_siteName: Metadata identicalpaleoData_proxy: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identicalgeo_siteName: Metadata identicalpaleoData_proxy: Metadata identicalyearUnits: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identicalgeo_siteName: Metadata identicalpaleoData_proxy: Metadata identicalyearUnits: Metadata identicalinterpretation_variable: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identicalgeo_siteName: Metadata identicalpaleoData_proxy: Metadata identicalyearUnits: Metadata identicalinterpretation_variable: Metadata identicalinterpretation_direction: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identicalgeo_siteName: Metadata identicalpaleoData_proxy: Metadata identicalyearUnits: Metadata identicalinterpretation_variable: Metadata identicalinterpretation_direction: Metadata identicalinterpretation_seasonality: Metadata identical
saved figure in /figs//all_merged/dup_detection//composite_FE23_australia_newz008_FE23_australia_newz092.pdf FE23_northamerica_usa_ca512 FE23_northamerica_usa_ca613 Add the following note to duplicateDetails: archiveType: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identical -------------------------------------------------------------------------------- Metadata different for >>>geo_siteName<<< in: FE23_northamerica_usa_ca512 (SantaAna) and FE23_northamerica_usa_ca613 (SantaAnaMts.(NewandOld)). Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: FE23_northamerica_usa_ca512 (SantaAna) and FE23_northamerica_usa_ca613 (SantaAnaMts.(NewandOld)). geo_siteName: Metadata composited to: COMPOSITE: SantaAna + SantaAnaMts.(NewandOld) Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: FE23_northamerica_usa_ca512 (SantaAna) and FE23_northamerica_usa_ca613 (SantaAnaMts.(NewandOld)). geo_siteName: Metadata composited to: COMPOSITE: SantaAna + SantaAnaMts.(NewandOld)paleoData_proxy: Metadata identical Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: FE23_northamerica_usa_ca512 (SantaAna) and FE23_northamerica_usa_ca613 (SantaAnaMts.(NewandOld)). geo_siteName: Metadata composited to: COMPOSITE: SantaAna + SantaAnaMts.(NewandOld)paleoData_proxy: Metadata identicalyearUnits: Metadata identical Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: FE23_northamerica_usa_ca512 (SantaAna) and FE23_northamerica_usa_ca613 (SantaAnaMts.(NewandOld)). geo_siteName: Metadata composited to: COMPOSITE: SantaAna + SantaAnaMts.(NewandOld)paleoData_proxy: Metadata identicalyearUnits: Metadata identicalinterpretation_variable: Metadata identical Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: FE23_northamerica_usa_ca512 (SantaAna) and FE23_northamerica_usa_ca613 (SantaAnaMts.(NewandOld)). geo_siteName: Metadata composited to: COMPOSITE: SantaAna + SantaAnaMts.(NewandOld)paleoData_proxy: Metadata identicalyearUnits: Metadata identicalinterpretation_variable: Metadata identicalinterpretation_direction: Metadata identical Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: FE23_northamerica_usa_ca512 (SantaAna) and FE23_northamerica_usa_ca613 (SantaAnaMts.(NewandOld)). geo_siteName: Metadata composited to: COMPOSITE: SantaAna + SantaAnaMts.(NewandOld)paleoData_proxy: Metadata identicalyearUnits: Metadata identicalinterpretation_variable: Metadata identicalinterpretation_direction: Metadata identicalinterpretation_seasonality: Metadata identical
saved figure in /figs//all_merged/dup_detection//composite_FE23_northamerica_usa_ca512_FE23_northamerica_usa_ca613.pdf FE23_northamerica_usa_me017 FE23_northamerica_usa_me018 Add the following note to duplicateDetails: archiveType: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identical -------------------------------------------------------------------------------- Metadata different for >>>geo_siteName<<< in: FE23_northamerica_usa_me017 (IronboundIsland) and FE23_northamerica_usa_me018 (IronboundIslandLongCores). Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: FE23_northamerica_usa_me017 (IronboundIsland) and FE23_northamerica_usa_me018 (IronboundIslandLongCores). geo_siteName: Metadata composited to: COMPOSITE: IronboundIsland + IronboundIslandLongCores Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: FE23_northamerica_usa_me017 (IronboundIsland) and FE23_northamerica_usa_me018 (IronboundIslandLongCores). geo_siteName: Metadata composited to: COMPOSITE: IronboundIsland + IronboundIslandLongCorespaleoData_proxy: Metadata identical Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: FE23_northamerica_usa_me017 (IronboundIsland) and FE23_northamerica_usa_me018 (IronboundIslandLongCores). geo_siteName: Metadata composited to: COMPOSITE: IronboundIsland + IronboundIslandLongCorespaleoData_proxy: Metadata identicalyearUnits: Metadata identical -------------------------------------------------------------------------------- Metadata different for >>>interpretation_variable<<< in: FE23_northamerica_usa_me017 (moisture) and FE23_northamerica_usa_me018 (temperature+moisture). Add the following note to duplicateDetails: Metadata differs for interpretation_variable in original records: FE23_northamerica_usa_me017 (moisture) and FE23_northamerica_usa_me018 (temperature+moisture). interpretation_variable: Metadata composited to: COMPOSITE: moisture + temperature+moisture Add the following note to duplicateDetails: Metadata differs for interpretation_variable in original records: FE23_northamerica_usa_me017 (moisture) and FE23_northamerica_usa_me018 (temperature+moisture). interpretation_variable: Metadata composited to: COMPOSITE: moisture + temperature+moistureinterpretation_direction: Metadata identical Add the following note to duplicateDetails: Metadata differs for interpretation_variable in original records: FE23_northamerica_usa_me017 (moisture) and FE23_northamerica_usa_me018 (temperature+moisture). interpretation_variable: Metadata composited to: COMPOSITE: moisture + temperature+moistureinterpretation_direction: Metadata identicalinterpretation_seasonality: Metadata identical
saved figure in /figs//all_merged/dup_detection//composite_FE23_northamerica_usa_me017_FE23_northamerica_usa_me018.pdf FE23_northamerica_usa_mt112 FE23_northamerica_usa_mt113 Add the following note to duplicateDetails: archiveType: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identical -------------------------------------------------------------------------------- Metadata different for >>>geo_siteName<<< in: FE23_northamerica_usa_mt112 (YellowMountainRidge1) and FE23_northamerica_usa_mt113 (YellowMountainRidge1-EntireBarkTrees). Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: FE23_northamerica_usa_mt112 (YellowMountainRidge1) and FE23_northamerica_usa_mt113 (YellowMountainRidge1-EntireBarkTrees). geo_siteName: Metadata composited to: COMPOSITE: YellowMountainRidge1 + YellowMountainRidge1-EntireBarkTrees Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: FE23_northamerica_usa_mt112 (YellowMountainRidge1) and FE23_northamerica_usa_mt113 (YellowMountainRidge1-EntireBarkTrees). geo_siteName: Metadata composited to: COMPOSITE: YellowMountainRidge1 + YellowMountainRidge1-EntireBarkTreespaleoData_proxy: Metadata identical Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: FE23_northamerica_usa_mt112 (YellowMountainRidge1) and FE23_northamerica_usa_mt113 (YellowMountainRidge1-EntireBarkTrees). geo_siteName: Metadata composited to: COMPOSITE: YellowMountainRidge1 + YellowMountainRidge1-EntireBarkTreespaleoData_proxy: Metadata identicalyearUnits: Metadata identical Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: FE23_northamerica_usa_mt112 (YellowMountainRidge1) and FE23_northamerica_usa_mt113 (YellowMountainRidge1-EntireBarkTrees). geo_siteName: Metadata composited to: COMPOSITE: YellowMountainRidge1 + YellowMountainRidge1-EntireBarkTreespaleoData_proxy: Metadata identicalyearUnits: Metadata identicalinterpretation_variable: Metadata identical Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: FE23_northamerica_usa_mt112 (YellowMountainRidge1) and FE23_northamerica_usa_mt113 (YellowMountainRidge1-EntireBarkTrees). geo_siteName: Metadata composited to: COMPOSITE: YellowMountainRidge1 + YellowMountainRidge1-EntireBarkTreespaleoData_proxy: Metadata identicalyearUnits: Metadata identicalinterpretation_variable: Metadata identicalinterpretation_direction: Metadata identical Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: FE23_northamerica_usa_mt112 (YellowMountainRidge1) and FE23_northamerica_usa_mt113 (YellowMountainRidge1-EntireBarkTrees). geo_siteName: Metadata composited to: COMPOSITE: YellowMountainRidge1 + YellowMountainRidge1-EntireBarkTreespaleoData_proxy: Metadata identicalyearUnits: Metadata identicalinterpretation_variable: Metadata identicalinterpretation_direction: Metadata identicalinterpretation_seasonality: Metadata identical
saved figure in /figs//all_merged/dup_detection//composite_FE23_northamerica_usa_mt112_FE23_northamerica_usa_mt113.pdf FE23_northamerica_usa_nj001 FE23_northamerica_usa_nj002 Add the following note to duplicateDetails: archiveType: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identical -------------------------------------------------------------------------------- Metadata different for >>>geo_siteName<<< in: FE23_northamerica_usa_nj001 (HutchensonForestwithLongCores) and FE23_northamerica_usa_nj002 (HutchensonForest). Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: FE23_northamerica_usa_nj001 (HutchensonForestwithLongCores) and FE23_northamerica_usa_nj002 (HutchensonForest). geo_siteName: Metadata composited to: COMPOSITE: HutchensonForestwithLongCores + HutchensonForest Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: FE23_northamerica_usa_nj001 (HutchensonForestwithLongCores) and FE23_northamerica_usa_nj002 (HutchensonForest). geo_siteName: Metadata composited to: COMPOSITE: HutchensonForestwithLongCores + HutchensonForestpaleoData_proxy: Metadata identical Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: FE23_northamerica_usa_nj001 (HutchensonForestwithLongCores) and FE23_northamerica_usa_nj002 (HutchensonForest). geo_siteName: Metadata composited to: COMPOSITE: HutchensonForestwithLongCores + HutchensonForestpaleoData_proxy: Metadata identicalyearUnits: Metadata identical Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: FE23_northamerica_usa_nj001 (HutchensonForestwithLongCores) and FE23_northamerica_usa_nj002 (HutchensonForest). geo_siteName: Metadata composited to: COMPOSITE: HutchensonForestwithLongCores + HutchensonForestpaleoData_proxy: Metadata identicalyearUnits: Metadata identicalinterpretation_variable: Metadata identical Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: FE23_northamerica_usa_nj001 (HutchensonForestwithLongCores) and FE23_northamerica_usa_nj002 (HutchensonForest). geo_siteName: Metadata composited to: COMPOSITE: HutchensonForestwithLongCores + HutchensonForestpaleoData_proxy: Metadata identicalyearUnits: Metadata identicalinterpretation_variable: Metadata identicalinterpretation_direction: Metadata identical Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: FE23_northamerica_usa_nj001 (HutchensonForestwithLongCores) and FE23_northamerica_usa_nj002 (HutchensonForest). geo_siteName: Metadata composited to: COMPOSITE: HutchensonForestwithLongCores + HutchensonForestpaleoData_proxy: Metadata identicalyearUnits: Metadata identicalinterpretation_variable: Metadata identicalinterpretation_direction: Metadata identicalinterpretation_seasonality: Metadata identical
saved figure in /figs//all_merged/dup_detection//composite_FE23_northamerica_usa_nj001_FE23_northamerica_usa_nj002.pdf ch2k_KU99HOU01_40 iso2k_788 Add the following note to duplicateDetails: archiveType: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identical -------------------------------------------------------------------------------- Metadata different for >>>geo_siteName<<< in: ch2k_KU99HOU01_40 (Houtman Abrolhos Islands, Australia) and iso2k_788 (Houtman Abrolhos Islands). Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: ch2k_KU99HOU01_40 (Houtman Abrolhos Islands, Australia) and iso2k_788 (Houtman Abrolhos Islands). geo_siteName: Metadata composited to: COMPOSITE: Houtman Abrolhos Islands, Australia + Houtman Abrolhos Islands Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: ch2k_KU99HOU01_40 (Houtman Abrolhos Islands, Australia) and iso2k_788 (Houtman Abrolhos Islands). geo_siteName: Metadata composited to: COMPOSITE: Houtman Abrolhos Islands, Australia + Houtman Abrolhos IslandspaleoData_proxy: Metadata identical Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: ch2k_KU99HOU01_40 (Houtman Abrolhos Islands, Australia) and iso2k_788 (Houtman Abrolhos Islands). geo_siteName: Metadata composited to: COMPOSITE: Houtman Abrolhos Islands, Australia + Houtman Abrolhos IslandspaleoData_proxy: Metadata identicalyearUnits: Metadata identical -------------------------------------------------------------------------------- Metadata different for >>>interpretation_variable<<< in: ch2k_KU99HOU01_40 (temperature+moisture) and iso2k_788 (N/A). Add the following note to duplicateDetails: Metadata differs for interpretation_variable in original records: ch2k_KU99HOU01_40 (temperature+moisture) and iso2k_788 (N/A). interpretation_variable: Metadata composited to: COMPOSITE: temperature+moisture + N/A Add the following note to duplicateDetails: Metadata differs for interpretation_variable in original records: ch2k_KU99HOU01_40 (temperature+moisture) and iso2k_788 (N/A). interpretation_variable: Metadata composited to: COMPOSITE: temperature+moisture + N/Ainterpretation_direction: Metadata identical Add the following note to duplicateDetails: Metadata differs for interpretation_variable in original records: ch2k_KU99HOU01_40 (temperature+moisture) and iso2k_788 (N/A). interpretation_variable: Metadata composited to: COMPOSITE: temperature+moisture + N/Ainterpretation_direction: Metadata identicalinterpretation_seasonality: Metadata identical
saved figure in /figs//all_merged/dup_detection//composite_ch2k_KU99HOU01_40_iso2k_788.pdf ch2k_KU00NIN01_150 iso2k_1556 Add the following note to duplicateDetails: archiveType: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identical -------------------------------------------------------------------------------- Metadata different for >>>geo_siteName<<< in: ch2k_KU00NIN01_150 (Ningaloo Reef, Australia) and iso2k_1556 (Ningaloo Reef, West Australia). Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: ch2k_KU00NIN01_150 (Ningaloo Reef, Australia) and iso2k_1556 (Ningaloo Reef, West Australia). geo_siteName: Metadata composited to: COMPOSITE: Ningaloo Reef, Australia + Ningaloo Reef, West Australia Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: ch2k_KU00NIN01_150 (Ningaloo Reef, Australia) and iso2k_1556 (Ningaloo Reef, West Australia). geo_siteName: Metadata composited to: COMPOSITE: Ningaloo Reef, Australia + Ningaloo Reef, West AustraliapaleoData_proxy: Metadata identical Add the following note to duplicateDetails: Metadata differs for geo_siteName in original records: ch2k_KU00NIN01_150 (Ningaloo Reef, Australia) and iso2k_1556 (Ningaloo Reef, West Australia). geo_siteName: Metadata composited to: COMPOSITE: Ningaloo Reef, Australia + Ningaloo Reef, West AustraliapaleoData_proxy: Metadata identicalyearUnits: Metadata identical -------------------------------------------------------------------------------- Metadata different for >>>interpretation_variable<<< in: ch2k_KU00NIN01_150 (temperature+moisture) and iso2k_1556 (N/A). Add the following note to duplicateDetails: Metadata differs for interpretation_variable in original records: ch2k_KU00NIN01_150 (temperature+moisture) and iso2k_1556 (N/A). interpretation_variable: Metadata composited to: COMPOSITE: temperature+moisture + N/A Add the following note to duplicateDetails: Metadata differs for interpretation_variable in original records: ch2k_KU00NIN01_150 (temperature+moisture) and iso2k_1556 (N/A). interpretation_variable: Metadata composited to: COMPOSITE: temperature+moisture + N/Ainterpretation_direction: Metadata identical Add the following note to duplicateDetails: Metadata differs for interpretation_variable in original records: ch2k_KU00NIN01_150 (temperature+moisture) and iso2k_1556 (N/A). interpretation_variable: Metadata composited to: COMPOSITE: temperature+moisture + N/Ainterpretation_direction: Metadata identicalinterpretation_seasonality: Metadata identical
saved figure in /figs//all_merged/dup_detection//composite_ch2k_KU00NIN01_150_iso2k_1556.pdf iso2k_786 iso2k_788 Add the following note to duplicateDetails: archiveType: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identicalgeo_siteName: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identicalgeo_siteName: Metadata identicalpaleoData_proxy: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identicalgeo_siteName: Metadata identicalpaleoData_proxy: Metadata identicalyearUnits: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identicalgeo_siteName: Metadata identicalpaleoData_proxy: Metadata identicalyearUnits: Metadata identicalinterpretation_variable: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identicalgeo_siteName: Metadata identicalpaleoData_proxy: Metadata identicalyearUnits: Metadata identicalinterpretation_variable: Metadata identicalinterpretation_direction: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identicalgeo_siteName: Metadata identicalpaleoData_proxy: Metadata identicalyearUnits: Metadata identicalinterpretation_variable: Metadata identicalinterpretation_direction: Metadata identicalinterpretation_seasonality: Metadata identical
saved figure in /figs//all_merged/dup_detection//composite_iso2k_786_iso2k_788.pdf iso2k_1554 iso2k_1556 Add the following note to duplicateDetails: archiveType: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identicalgeo_siteName: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identicalgeo_siteName: Metadata identicalpaleoData_proxy: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identicalgeo_siteName: Metadata identicalpaleoData_proxy: Metadata identicalyearUnits: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identicalgeo_siteName: Metadata identicalpaleoData_proxy: Metadata identicalyearUnits: Metadata identicalinterpretation_variable: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identicalgeo_siteName: Metadata identicalpaleoData_proxy: Metadata identicalyearUnits: Metadata identicalinterpretation_variable: Metadata identicalinterpretation_direction: Metadata identical Add the following note to duplicateDetails: archiveType: Metadata identicalgeo_meanElev: Metadata identicalgeo_meanLat: Metadata identicalgeo_meanLon: Metadata identicalgeo_siteName: Metadata identicalpaleoData_proxy: Metadata identicalyearUnits: Metadata identicalinterpretation_variable: Metadata identicalinterpretation_direction: Metadata identicalinterpretation_seasonality: Metadata identical
saved figure in /figs//all_merged/dup_detection//composite_iso2k_1554_iso2k_1556.pdf
3. Check for overlap between REMOVE and COMPOSITE¶
The duplicate free dataframe is obtained by joining
df_dupfree_rmv(duplicate free as all records with decisionREMOVEand/orCOMPOSITEremoved) anddf_composite(dupicate free as duplicates are composited)
There might still be duplicates between the two dataframes: when a record has been associated with more than 1 duplicate candidate pair.
The scenarios for duplicates appearing twice:
REMOVE/KEEPandCOMPOSITE:
- duplicate pair
aandbhave had the decisions assigned:a->REMOVE,b->KEEP - duplicate pair
aandchave had the decisions assigned:a->COMPOSITE,c->COMPOSITE.
In this case, b and ac (the composite record of a and c) would be duplicates in the merged dataframe
2a. REMOVE/KEEP & KEEP/REMOVE:
duplicate pair
aandbhave had the decisions assigned:a->REMOVE,b->KEEPduplicate pair
aandchave had the decisions assigned:a->KEEP,c->REMOVE.In this case
awould still be removed asREMOVEoverridesKEEPin the algorithm. So onlybwill be kept and no duplicates would remain.
2b. REMOVE/KEEP & REMOVE/KEEP
duplicate pair
aandbhave had the decisions assigned:a->REMOVE,b->KEEPduplicate pair
aandchave had the decisions assigned:a->REMOVE,c->KEEP.In this case,
awould be removed, butbandcwill be kept and would be duplicates in the merged dataframe.
COMPOSITEx 2
- duplicate pair
aandbhave had the decisions assigned:a->COMPOSITE,b->COMPOSITE - duplicate pair
aandchave had the decisions assigned:a->COMPOSITE,c->COMPOSITE.
In this case, ab and ac would be duplicates in the merged dataframe.
REMOVE/KEEPandKEEP/KEEP
duplicate pair
aandbhave had the decisions assigned:a->REMOVE,b->KEEPduplicate pair
aandchave had the decisions assigned:a->KEEP,c->KEEP.In this case
awould be removed,bandcwould be kept but ascis not a duplicate ofano duplicates would remain.
What about records which appear more than twice? These records would not be dealt with by the following approach, as this only removes duplciates which appear TWICE in the dataset.
The algorithm is currently NOT handling multiple duplicate records. In order to do this, we'd have to set up a WHILE loop which runs UNTIL no duplicate has been assigned to more than one decision! Doable!
Merge the composites and the dataframe freed from REMOVE and COMPOSITE type records
df_duprmv_cmp = pd.concat([df_dupfree_rmv, df_composite])
df_duprmv_cmp.index = df_duprmv_cmp['datasetId']
Check for overlap between REMOVE and COMPOSITE type IDs
overlap_rmv_cmp = []
for id in decisions.keys():
if ('REMOVE' in decisions[id])&('COMPOSITE' in decisions[id]):
overlap_rmv_cmp.append(id)
if len(overlap_rmv_cmp)>0:
print('WARNING! Overlap detected between REMOVE and COMPOSITE records.')
print('Please review these records below.')
else:
print('No overlap between REMOVE and COMPOSITE type records.')
WARNING! Overlap detected between REMOVE and COMPOSITE records. Please review these records below.
check_for_dups = []
for id in overlap_rmv_cmp:
check_decisions = df_decisions[(df_decisions['datasetId 1']==id)|(df_decisions['datasetId 2']==id)][['datasetId 1', 'datasetId 2', 'Decision 1', 'Decision 2']]
# print(check_decisions)
for ind in check_decisions.index:
dec1, dec2 = check_decisions.loc[ind, ['Decision 1', 'Decision 2']]
id1, id2 = check_decisions.loc[ind, ['datasetId 1', 'datasetId 2']]
if dec1=='KEEP':
check_for_dups.append(id1)
if dec2=='KEEP':
check_for_dups.append(id2)
print(check_for_dups)
['pages2k_468', 'pages2k_1089', 'pages2k_1089', 'pages2k_2793', 'pages2k_2796', 'ch2k_KU00NIN01_150', 'iso2k_1554', 'ch2k_KU00NIN01_150', 'iso2k_1554', 'iso2k_786']
Now we create a small dataframe which needs to be checked for duplicates.
df_check=df.loc[check_for_dups]
df_check = pd.concat([df_composite, df_check])
df_check.name = 'tmp'
We then run a brief duplicate detection algorithm on the dataframe. Note that by default the composited data has the highest value in the hierarchy.
dup.find_duplicates_optimized(df_check, n_points_thresh=10)
df_check['Hierarchy'] = 0
df_check.loc[df_check['originalDatabase']=='PAGES2k v2.2.0', 'Hierarchy'] = 5
df_check.loc[df_check['originalDatabase']=='FE23 (Breitenmoser et al. (2014))', 'Hierarchy'] = 4
df_check.loc[df_check['originalDatabase']=='CoralHydro2k v1.0.1', 'Hierarchy'] = 2
df_check.loc[df_check['originalDatabase']=='Iso2k v1.1.2', 'Hierarchy'] = 3
df_check.loc[df_check['originalDatabase']=='SISAL v3', 'Hierarchy'] = 1
dup.duplicate_decisions(df_check, operator_details=operator_details, choose_recollection=True,
remove_identicals=True)
tmp Start duplicate search: ================================= checking parameters: proxy archive : must match proxy type : must match distance (km) < 8 elevation : must match time overlap > 10 correlation > 0.9 RMSE < 0.1 1st difference rmse < 0.1 correlation of 1st difference > 0.9 ================================= Start duplicate search Progress: 0/48 --> Found potential duplicate: 2: dod2k_composite_z_pages2k_468_pages2k_3550&24: dod2k_composite_z_pages2k_3550_fe23_asia_russ137w (n_potential_duplicates=1) --> Found potential duplicate: 2: dod2k_composite_z_pages2k_468_pages2k_3550&38: pages2k_468 (n_potential_duplicates=2) --> Found potential duplicate: 5: dod2k_composite_z_pages2k_2795_pages2k_2798&41: pages2k_2793 (n_potential_duplicates=3) --> Found potential duplicate: 5: dod2k_composite_z_pages2k_2795_pages2k_2798&42: pages2k_2796 (n_potential_duplicates=4) Progress: 10/48 --> Found potential duplicate: 15: dod2k_composite_z_pages2k_3085_iso2k_1556&35: dod2k_composite_z_ch2k_ku00nin01_150_iso2k_1556 (n_potential_duplicates=5) --> Found potential duplicate: 15: dod2k_composite_z_pages2k_3085_iso2k_1556&37: dod2k_composite_z_iso2k_1554_iso2k_1556 (n_potential_duplicates=6) --> Found potential duplicate: 15: dod2k_composite_z_pages2k_3085_iso2k_1556&43: ch2k_ku00nin01_150 (n_potential_duplicates=7) --> Found potential duplicate: 15: dod2k_composite_z_pages2k_3085_iso2k_1556&44: iso2k_1554 (n_potential_duplicates=8) --> Found potential duplicate: 15: dod2k_composite_z_pages2k_3085_iso2k_1556&45: ch2k_ku00nin01_150 (n_potential_duplicates=9) --> Found potential duplicate: 15: dod2k_composite_z_pages2k_3085_iso2k_1556&46: iso2k_1554 (n_potential_duplicates=10) Progress: 20/48 --> Found potential duplicate: 24: dod2k_composite_z_pages2k_3550_fe23_asia_russ137w&38: pages2k_468 (n_potential_duplicates=11) Progress: 30/48 --> Found potential duplicate: 34: dod2k_composite_z_ch2k_ku99hou01_40_iso2k_788&36: dod2k_composite_z_iso2k_786_iso2k_788 (n_potential_duplicates=12) --> Found potential duplicate: 34: dod2k_composite_z_ch2k_ku99hou01_40_iso2k_788&47: iso2k_786 (n_potential_duplicates=13) --> Found potential duplicate: 35: dod2k_composite_z_ch2k_ku00nin01_150_iso2k_1556&37: dod2k_composite_z_iso2k_1554_iso2k_1556 (n_potential_duplicates=14) --> Found potential duplicate: 35: dod2k_composite_z_ch2k_ku00nin01_150_iso2k_1556&43: ch2k_ku00nin01_150 (n_potential_duplicates=15) --> Found potential duplicate: 35: dod2k_composite_z_ch2k_ku00nin01_150_iso2k_1556&44: iso2k_1554 (n_potential_duplicates=16) --> Found potential duplicate: 35: dod2k_composite_z_ch2k_ku00nin01_150_iso2k_1556&45: ch2k_ku00nin01_150 (n_potential_duplicates=17) --> Found potential duplicate: 35: dod2k_composite_z_ch2k_ku00nin01_150_iso2k_1556&46: iso2k_1554 (n_potential_duplicates=18) --> Found potential duplicate: 36: dod2k_composite_z_iso2k_786_iso2k_788&47: iso2k_786 (n_potential_duplicates=19) --> Found potential duplicate: 37: dod2k_composite_z_iso2k_1554_iso2k_1556&43: ch2k_ku00nin01_150 (n_potential_duplicates=20) --> Found potential duplicate: 37: dod2k_composite_z_iso2k_1554_iso2k_1556&44: iso2k_1554 (n_potential_duplicates=21) --> Found potential duplicate: 37: dod2k_composite_z_iso2k_1554_iso2k_1556&45: ch2k_ku00nin01_150 (n_potential_duplicates=22) --> Found potential duplicate: 37: dod2k_composite_z_iso2k_1554_iso2k_1556&46: iso2k_1554 (n_potential_duplicates=23) --> Found potential duplicate: 39: pages2k_1089&40: pages2k_1089 (n_potential_duplicates=24) Progress: 40/48 --> Found potential duplicate: 43: ch2k_ku00nin01_150&44: iso2k_1554 (n_potential_duplicates=25) --> Found potential duplicate: 43: ch2k_ku00nin01_150&45: ch2k_ku00nin01_150 (n_potential_duplicates=26) --> Found potential duplicate: 43: ch2k_ku00nin01_150&46: iso2k_1554 (n_potential_duplicates=27) --> Found potential duplicate: 44: iso2k_1554&45: ch2k_ku00nin01_150 (n_potential_duplicates=28) --> Found potential duplicate: 44: iso2k_1554&46: iso2k_1554 (n_potential_duplicates=29) --> Found potential duplicate: 45: ch2k_ku00nin01_150&46: iso2k_1554 (n_potential_duplicates=30) ============================================================ Saved indices, IDs, distances, correlations in data/tmp/dup_detection/ ============================================================ Detected 30 possible duplicates in tmp. ============================================================
header [' Decisions for duplicate candidate pairs. ', ' Operated by Lucie Luecke (LL)', ' E-Mail: ljluec1@st-andrews.ac.uk', ' Created on: 2025-11-11 15:00:09.281159 (UTC)', 'index 1'] data [['2', '24', 'https://nzero.umd.edu:444/hub/user-redirect/lab/tree/compile_proxy_database_v2.1/tmp/dup_detection/000_dod2k_composite_z_pages2k_468_pages2k_3550_dod2k_composite_z_pages2k_3550_FE23_asia_russ137w__2_24.jpg', 'dod2k_composite_z_pages2k_468_pages2k_3550', 'dod2k_composite_z_pages2k_3550_FE23_asia_russ137w', 'dod2k_composite_z', 'dod2k_composite_z', 'Altai Mt., Ust Ulagan Lake', 'COMPOSITE: Altai Mt., Ust Ulagan Lake + UstUlaganLake(Altai)', '50.47999954223633', '50.481666564941406', '87.6500015258789', '87.6500015258789', '2150.0', '2150.0', 'Wood', 'Wood', 'ring width', 'ring width', 'pages2k_468: https://www.ncdc.noaa.gov/paleo/study/4710, pages2k_3550: https://www.ncdc.noaa.gov/paleo/study/4710', 'pages2k_3550: https://www.ncdc.noaa.gov/paleo/study/4710, FE23_asia_russ137w: https://www.ncei.noaa.gov/pub/data/paleo/treering/measurements/asia/russ137w-noaa.rwl', '1581.0-1994.0', '1581.0-1994.0', 'KEEP', 'REMOVE', 'AUTO: IDENTICAL except for URLs and/or geo_siteName.', 'RECORDS IDENTICAL (perfect correlation) except for metadata. Automatically choose #1.'], ['2', '38', 'https://nzero.umd.edu:444/hub/user-redirect/lab/tree/compile_proxy_database_v2.1/tmp/dup_detection/001_dod2k_composite_z_pages2k_468_pages2k_3550_pages2k_468__2_38.jpg', 'dod2k_composite_z_pages2k_468_pages2k_3550', 'pages2k_468', 'dod2k_composite_z', 'PAGES 2k v2.2.0', 'Altai Mt., Ust Ulagan Lake', 'Altai Mt., Ust Ulagan Lake', '50.47999954223633', '50.47999954223633', '87.6500015258789', '87.6500015258789', '2150.0', '2150.0', 'Wood', 'Wood', 'ring width', 'ring width', 'pages2k_468: https://www.ncdc.noaa.gov/paleo/study/4710, pages2k_3550: https://www.ncdc.noaa.gov/paleo/study/4710', 'https://www.ncdc.noaa.gov/paleo/study/4710', '1581.0-1994.0', '1581.0-1994.0', 'KEEP', 'REMOVE', 'AUTO: IDENTICAL except for URLs and/or geo_siteName.', 'RECORDS IDENTICAL (perfect correlation) except for metadata. Automatically choose #1.'], ['5', '41', 'https://nzero.umd.edu:444/hub/user-redirect/lab/tree/compile_proxy_database_v2.1/tmp/dup_detection/002_dod2k_composite_z_pages2k_2795_pages2k_2798_pages2k_2793__5_41.jpg', 'dod2k_composite_z_pages2k_2795_pages2k_2798', 'pages2k_2793', 'dod2k_composite_z', 'PAGES 2k v2.2.0', 'Eastern tropical North Atlantic', 'Eastern tropical North Atlantic', '16.84000015258789', '16.840200424194336', '-16.732999801635742', '-16.73270034790039', '-330.0', '-330.0', 'MarineSediment', 'MarineSediment', 'Mg/Ca', 'Mg/Ca', 'pages2k_2795: http://doi.pangaea.de/10.1594/PANGAEA.773754, pages2k_2798: http://doi.pangaea.de/10.1594/PANGAEA.773754', 'http://doi.pangaea.de/10.1594/PANGAEA.773754', '21.0-2004.0', '21.0-2004.0', 'KEEP', 'REMOVE', 'AUTO: IDENTICAL except for URLs and/or geo_siteName.', 'RECORDS IDENTICAL (perfect correlation) except for metadata. Automatically choose #1.'], ['5', '42', 'https://nzero.umd.edu:444/hub/user-redirect/lab/tree/compile_proxy_database_v2.1/tmp/dup_detection/003_dod2k_composite_z_pages2k_2795_pages2k_2798_pages2k_2796__5_42.jpg', 'dod2k_composite_z_pages2k_2795_pages2k_2798', 'pages2k_2796', 'dod2k_composite_z', 'PAGES 2k v2.2.0', 'Eastern tropical North Atlantic', 'Eastern tropical North Atlantic', '16.84000015258789', '16.840200424194336', '-16.732999801635742', '-16.73270034790039', '-330.0', '-330.0', 'MarineSediment', 'MarineSediment', 'Mg/Ca', 'Mg/Ca', 'pages2k_2795: http://doi.pangaea.de/10.1594/PANGAEA.773754, pages2k_2798: http://doi.pangaea.de/10.1594/PANGAEA.773754', 'http://doi.pangaea.de/10.1594/PANGAEA.773754', '21.0-2004.0', '21.0-1947.0', 'KEEP', 'KEEP', 'MANUAL', 'b'], ['15', '35', 'https://nzero.umd.edu:444/hub/user-redirect/lab/tree/compile_proxy_database_v2.1/tmp/dup_detection/004_dod2k_composite_z_pages2k_3085_iso2k_1556_dod2k_composite_z_ch2k_KU00NIN01_150_iso2k_1556__15_35.jpg', 'dod2k_composite_z_pages2k_3085_iso2k_1556', 'dod2k_composite_z_ch2k_KU00NIN01_150_iso2k_1556', 'dod2k_composite_z', 'dod2k_composite_z', 'COMPOSITE: Ningaloo + Ningaloo Reef, West Australia', 'COMPOSITE: Ningaloo Reef, Australia + Ningaloo Reef, West Australia', '-21.905000686645508', '-21.905000686645508', '113.96499633789062', '113.96499633789062', '-3.0', '-3.0', 'Coral', 'Coral', 'd18O', 'd18O', 'pages2k_3085: http://hurricane.ncdc.noaa.gov/pls/paleox/f?p=519:1:::::P1_study_id:1867, iso2k_1556: https://www.ncdc.noaa.gov/paleo/study/1867', 'ch2k_KU00NIN01_150: https://www.ncdc.noaa.gov/paleo/study/1867, iso2k_1556: https://www.ncdc.noaa.gov/paleo/study/1867', '1878.7-1995.2', '1878.7-1995.2', 'KEEP', 'REMOVE', 'AUTO: IDENTICAL except for URLs and/or geo_siteName.', 'RECORDS IDENTICAL (perfect correlation) except for metadata. Automatically choose #1.'], ['15', '37', 'https://nzero.umd.edu:444/hub/user-redirect/lab/tree/compile_proxy_database_v2.1/tmp/dup_detection/005_dod2k_composite_z_pages2k_3085_iso2k_1556_dod2k_composite_z_iso2k_1554_iso2k_1556__15_37.jpg', 'dod2k_composite_z_pages2k_3085_iso2k_1556', 'dod2k_composite_z_iso2k_1554_iso2k_1556', 'dod2k_composite_z', 'dod2k_composite_z', 'COMPOSITE: Ningaloo + Ningaloo Reef, West Australia', 'Ningaloo Reef, West Australia', '-21.905000686645508', '-21.905000686645508', '113.96499633789062', '113.96499633789062', '-3.0', '-3.0', 'Coral', 'Coral', 'd18O', 'd18O', 'pages2k_3085: http://hurricane.ncdc.noaa.gov/pls/paleox/f?p=519:1:::::P1_study_id:1867, iso2k_1556: https://www.ncdc.noaa.gov/paleo/study/1867', 'iso2k_1554: https://www.ncdc.noaa.gov/paleo/study/1867, iso2k_1556: https://www.ncdc.noaa.gov/paleo/study/1867', '1878.7-1995.2', '1878.7-1995.2', 'KEEP', 'REMOVE', 'AUTO: IDENTICAL except for URLs and/or geo_siteName.', 'RECORDS IDENTICAL (perfect correlation) except for metadata. Automatically choose #1.'], ['15', '43', 'https://nzero.umd.edu:444/hub/user-redirect/lab/tree/compile_proxy_database_v2.1/tmp/dup_detection/006_dod2k_composite_z_pages2k_3085_iso2k_1556_ch2k_KU00NIN01_150__15_43.jpg', 'dod2k_composite_z_pages2k_3085_iso2k_1556', 'ch2k_KU00NIN01_150', 'dod2k_composite_z', 'CoralHydro2k v1.0.1', 'COMPOSITE: Ningaloo + Ningaloo Reef, West Australia', 'Ningaloo Reef, Australia', '-21.905000686645508', '-21.905000686645508', '113.96499633789062', '113.96499633789062', '-3.0', '-3.0', 'Coral', 'Coral', 'd18O', 'd18O', 'pages2k_3085: http://hurricane.ncdc.noaa.gov/pls/paleox/f?p=519:1:::::P1_study_id:1867, iso2k_1556: https://www.ncdc.noaa.gov/paleo/study/1867', 'https://www.ncdc.noaa.gov/paleo/study/1867', '1878.7-1995.2', '1878.7-1995.2', 'COMPOSITE', 'COMPOSITE', 'MANUAL', 'c'], ['15', '44', 'https://nzero.umd.edu:444/hub/user-redirect/lab/tree/compile_proxy_database_v2.1/tmp/dup_detection/007_dod2k_composite_z_pages2k_3085_iso2k_1556_iso2k_1554__15_44.jpg', 'dod2k_composite_z_pages2k_3085_iso2k_1556', 'iso2k_1554', 'dod2k_composite_z', 'Iso2k v1.1.2', 'COMPOSITE: Ningaloo + Ningaloo Reef, West Australia', 'Ningaloo Reef, West Australia', '-21.905000686645508', '-21.905000686645508', '113.96499633789062', '113.96499633789062', '-3.0', '-3.0', 'Coral', 'Coral', 'd18O', 'd18O', 'pages2k_3085: http://hurricane.ncdc.noaa.gov/pls/paleox/f?p=519:1:::::P1_study_id:1867, iso2k_1556: https://www.ncdc.noaa.gov/paleo/study/1867', 'https://www.ncdc.noaa.gov/paleo/study/1867', '1878.7-1995.2', '1878.7-1995.2', 'KEEP', 'REMOVE', 'MANUAL', '1'], ['15', '45', 'https://nzero.umd.edu:444/hub/user-redirect/lab/tree/compile_proxy_database_v2.1/tmp/dup_detection/008_dod2k_composite_z_pages2k_3085_iso2k_1556_ch2k_KU00NIN01_150__15_45.jpg', 'dod2k_composite_z_pages2k_3085_iso2k_1556', 'ch2k_KU00NIN01_150', 'dod2k_composite_z', 'CoralHydro2k v1.0.1', 'COMPOSITE: Ningaloo + Ningaloo Reef, West Australia', 'Ningaloo Reef, Australia', '-21.905000686645508', '-21.905000686645508', '113.96499633789062', '113.96499633789062', '-3.0', '-3.0', 'Coral', 'Coral', 'd18O', 'd18O', 'pages2k_3085: http://hurricane.ncdc.noaa.gov/pls/paleox/f?p=519:1:::::P1_study_id:1867, iso2k_1556: https://www.ncdc.noaa.gov/paleo/study/1867', 'https://www.ncdc.noaa.gov/paleo/study/1867', '1878.7-1995.2', '1878.7-1995.2', 'KEEP', 'REMOVE', 'MANUAL', '1'], ['15', '46', 'https://nzero.umd.edu:444/hub/user-redirect/lab/tree/compile_proxy_database_v2.1/tmp/dup_detection/009_dod2k_composite_z_pages2k_3085_iso2k_1556_iso2k_1554__15_46.jpg', 'dod2k_composite_z_pages2k_3085_iso2k_1556', 'iso2k_1554', 'dod2k_composite_z', 'Iso2k v1.1.2', 'COMPOSITE: Ningaloo + Ningaloo Reef, West Australia', 'Ningaloo Reef, West Australia', '-21.905000686645508', '-21.905000686645508', '113.96499633789062', '113.96499633789062', '-3.0', '-3.0', 'Coral', 'Coral', 'd18O', 'd18O', 'pages2k_3085: http://hurricane.ncdc.noaa.gov/pls/paleox/f?p=519:1:::::P1_study_id:1867, iso2k_1556: https://www.ncdc.noaa.gov/paleo/study/1867', 'https://www.ncdc.noaa.gov/paleo/study/1867', '1878.7-1995.2', '1878.7-1995.2', 'KEEP', 'REMOVE', 'MANUAL', '1'], ['24', '38', 'https://nzero.umd.edu:444/hub/user-redirect/lab/tree/compile_proxy_database_v2.1/tmp/dup_detection/010_dod2k_composite_z_pages2k_3550_FE23_asia_russ137w_pages2k_468__24_38.jpg', 'dod2k_composite_z_pages2k_3550_FE23_asia_russ137w', 'pages2k_468', 'dod2k_composite_z', 'PAGES 2k v2.2.0', 'COMPOSITE: Altai Mt., Ust Ulagan Lake + UstUlaganLake(Altai)', 'Altai Mt., Ust Ulagan Lake', '50.481666564941406', '50.47999954223633', '87.6500015258789', '87.6500015258789', '2150.0', '2150.0', 'Wood', 'Wood', 'ring width', 'ring width', 'pages2k_3550: https://www.ncdc.noaa.gov/paleo/study/4710, FE23_asia_russ137w: https://www.ncei.noaa.gov/pub/data/paleo/treering/measurements/asia/russ137w-noaa.rwl', 'https://www.ncdc.noaa.gov/paleo/study/4710', '1581.0-1994.0', '1581.0-1994.0', 'KEEP', 'REMOVE', 'AUTO: IDENTICAL except for URLs and/or geo_siteName.', 'RECORDS IDENTICAL (perfect correlation) except for metadata. Automatically choose #1.'], ['34', '36', 'https://nzero.umd.edu:444/hub/user-redirect/lab/tree/compile_proxy_database_v2.1/tmp/dup_detection/011_dod2k_composite_z_ch2k_KU99HOU01_40_iso2k_788_dod2k_composite_z_iso2k_786_iso2k_788__34_36.jpg', 'dod2k_composite_z_ch2k_KU99HOU01_40_iso2k_788', 'dod2k_composite_z_iso2k_786_iso2k_788', 'dod2k_composite_z', 'dod2k_composite_z', 'COMPOSITE: Houtman Abrolhos Islands, Australia + Houtman Abrolhos Islands', 'Houtman Abrolhos Islands', '-28.461999893188477', '-28.461999893188477', '113.76799774169922', '113.76799774169922', '-5.0', '-5.0', 'Coral', 'Coral', 'd18O', 'd18O', 'ch2k_KU99HOU01_40: https://www.ncdc.noaa.gov/paleo/study/1856, iso2k_788: https://www.ncdc.noaa.gov/paleo/study/1856', 'iso2k_786: https://www.ncdc.noaa.gov/paleo/study/1856, iso2k_788: https://www.ncdc.noaa.gov/paleo/study/1856', '1794.7-1994.4', '1794.7-1994.4', 'KEEP', 'REMOVE', 'AUTO: IDENTICAL except for URLs and/or geo_siteName.', 'RECORDS IDENTICAL (perfect correlation) except for metadata. Automatically choose #1.'], ['34', '47', 'https://nzero.umd.edu:444/hub/user-redirect/lab/tree/compile_proxy_database_v2.1/tmp/dup_detection/012_dod2k_composite_z_ch2k_KU99HOU01_40_iso2k_788_iso2k_786__34_47.jpg', 'dod2k_composite_z_ch2k_KU99HOU01_40_iso2k_788', 'iso2k_786', 'dod2k_composite_z', 'Iso2k v1.1.2', 'COMPOSITE: Houtman Abrolhos Islands, Australia + Houtman Abrolhos Islands', 'Houtman Abrolhos Islands', '-28.461999893188477', '-28.461700439453125', '113.76799774169922', '113.76830291748047', '-5.0', '-5.0', 'Coral', 'Coral', 'd18O', 'd18O', 'ch2k_KU99HOU01_40: https://www.ncdc.noaa.gov/paleo/study/1856, iso2k_788: https://www.ncdc.noaa.gov/paleo/study/1856', 'https://www.ncdc.noaa.gov/paleo/study/1856', '1794.7-1994.4', '1794.7-1994.4', 'KEEP', 'REMOVE', 'MANUAL', '1'], ['35', '37', 'https://nzero.umd.edu:444/hub/user-redirect/lab/tree/compile_proxy_database_v2.1/tmp/dup_detection/013_dod2k_composite_z_ch2k_KU00NIN01_150_iso2k_1556_dod2k_composite_z_iso2k_1554_iso2k_1556__35_37.jpg', 'dod2k_composite_z_ch2k_KU00NIN01_150_iso2k_1556', 'dod2k_composite_z_iso2k_1554_iso2k_1556', 'dod2k_composite_z', 'dod2k_composite_z', 'COMPOSITE: Ningaloo Reef, Australia + Ningaloo Reef, West Australia', 'Ningaloo Reef, West Australia', '-21.905000686645508', '-21.905000686645508', '113.96499633789062', '113.96499633789062', '-3.0', '-3.0', 'Coral', 'Coral', 'd18O', 'd18O', 'ch2k_KU00NIN01_150: https://www.ncdc.noaa.gov/paleo/study/1867, iso2k_1556: https://www.ncdc.noaa.gov/paleo/study/1867', 'iso2k_1554: https://www.ncdc.noaa.gov/paleo/study/1867, iso2k_1556: https://www.ncdc.noaa.gov/paleo/study/1867', '1878.7-1995.2', '1878.7-1995.2', 'KEEP', 'REMOVE', 'AUTO: IDENTICAL except for URLs and/or geo_siteName.', 'RECORDS IDENTICAL (perfect correlation) except for metadata. Automatically choose #1.'], ['35', '43', 'https://nzero.umd.edu:444/hub/user-redirect/lab/tree/compile_proxy_database_v2.1/tmp/dup_detection/014_dod2k_composite_z_ch2k_KU00NIN01_150_iso2k_1556_ch2k_KU00NIN01_150__35_43.jpg', 'dod2k_composite_z_ch2k_KU00NIN01_150_iso2k_1556', 'ch2k_KU00NIN01_150', 'dod2k_composite_z', 'CoralHydro2k v1.0.1', 'COMPOSITE: Ningaloo Reef, Australia + Ningaloo Reef, West Australia', 'Ningaloo Reef, Australia', '-21.905000686645508', '-21.905000686645508', '113.96499633789062', '113.96499633789062', '-3.0', '-3.0', 'Coral', 'Coral', 'd18O', 'd18O', 'ch2k_KU00NIN01_150: https://www.ncdc.noaa.gov/paleo/study/1867, iso2k_1556: https://www.ncdc.noaa.gov/paleo/study/1867', 'https://www.ncdc.noaa.gov/paleo/study/1867', '1878.7-1995.2', '1878.7-1995.2', 'KEEP', 'REMOVE', 'MANUAL', '1'], ['35', '44', 'https://nzero.umd.edu:444/hub/user-redirect/lab/tree/compile_proxy_database_v2.1/tmp/dup_detection/015_dod2k_composite_z_ch2k_KU00NIN01_150_iso2k_1556_iso2k_1554__35_44.jpg', 'dod2k_composite_z_ch2k_KU00NIN01_150_iso2k_1556', 'iso2k_1554', 'dod2k_composite_z', 'Iso2k v1.1.2', 'COMPOSITE: Ningaloo Reef, Australia + Ningaloo Reef, West Australia', 'Ningaloo Reef, West Australia', '-21.905000686645508', '-21.905000686645508', '113.96499633789062', '113.96499633789062', '-3.0', '-3.0', 'Coral', 'Coral', 'd18O', 'd18O', 'ch2k_KU00NIN01_150: https://www.ncdc.noaa.gov/paleo/study/1867, iso2k_1556: https://www.ncdc.noaa.gov/paleo/study/1867', 'https://www.ncdc.noaa.gov/paleo/study/1867', '1878.7-1995.2', '1878.7-1995.2', 'KEEP', 'REMOVE', 'MANUAL', '1'], ['35', '45', 'https://nzero.umd.edu:444/hub/user-redirect/lab/tree/compile_proxy_database_v2.1/tmp/dup_detection/016_dod2k_composite_z_ch2k_KU00NIN01_150_iso2k_1556_ch2k_KU00NIN01_150__35_45.jpg', 'dod2k_composite_z_ch2k_KU00NIN01_150_iso2k_1556', 'ch2k_KU00NIN01_150', 'dod2k_composite_z', 'CoralHydro2k v1.0.1', 'COMPOSITE: Ningaloo Reef, Australia + Ningaloo Reef, West Australia', 'Ningaloo Reef, Australia', '-21.905000686645508', '-21.905000686645508', '113.96499633789062', '113.96499633789062', '-3.0', '-3.0', 'Coral', 'Coral', 'd18O', 'd18O', 'ch2k_KU00NIN01_150: https://www.ncdc.noaa.gov/paleo/study/1867, iso2k_1556: https://www.ncdc.noaa.gov/paleo/study/1867', 'https://www.ncdc.noaa.gov/paleo/study/1867', '1878.7-1995.2', '1878.7-1995.2', 'COMPOSITE', 'COMPOSITE', 'MANUAL', ''], ['35', '46', 'https://nzero.umd.edu:444/hub/user-redirect/lab/tree/compile_proxy_database_v2.1/tmp/dup_detection/017_dod2k_composite_z_ch2k_KU00NIN01_150_iso2k_1556_iso2k_1554__35_46.jpg', 'dod2k_composite_z_ch2k_KU00NIN01_150_iso2k_1556', 'iso2k_1554', 'dod2k_composite_z', 'Iso2k v1.1.2', 'COMPOSITE: Ningaloo Reef, Australia + Ningaloo Reef, West Australia', 'Ningaloo Reef, West Australia', '-21.905000686645508', '-21.905000686645508', '113.96499633789062', '113.96499633789062', '-3.0', '-3.0', 'Coral', 'Coral', 'd18O', 'd18O', 'ch2k_KU00NIN01_150: https://www.ncdc.noaa.gov/paleo/study/1867, iso2k_1556: https://www.ncdc.noaa.gov/paleo/study/1867', 'https://www.ncdc.noaa.gov/paleo/study/1867', '1878.7-1995.2', '1878.7-1995.2', 'COMPOSITE', 'COMPOSITE', 'MANUAL', ''], ['36', '47', 'https://nzero.umd.edu:444/hub/user-redirect/lab/tree/compile_proxy_database_v2.1/tmp/dup_detection/018_dod2k_composite_z_iso2k_786_iso2k_788_iso2k_786__36_47.jpg', 'dod2k_composite_z_iso2k_786_iso2k_788', 'iso2k_786', 'dod2k_composite_z', 'Iso2k v1.1.2', 'Houtman Abrolhos Islands', 'Houtman Abrolhos Islands', '-28.461999893188477', '-28.461700439453125', '113.76799774169922', '113.76830291748047', '-5.0', '-5.0', 'Coral', 'Coral', 'd18O', 'd18O', 'iso2k_786: https://www.ncdc.noaa.gov/paleo/study/1856, iso2k_788: https://www.ncdc.noaa.gov/paleo/study/1856', 'https://www.ncdc.noaa.gov/paleo/study/1856', '1794.7-1994.4', '1794.7-1994.4', 'KEEP', 'REMOVE', 'MANUAL', ''], ['37', '43', 'https://nzero.umd.edu:444/hub/user-redirect/lab/tree/compile_proxy_database_v2.1/tmp/dup_detection/019_dod2k_composite_z_iso2k_1554_iso2k_1556_ch2k_KU00NIN01_150__37_43.jpg', 'dod2k_composite_z_iso2k_1554_iso2k_1556', 'ch2k_KU00NIN01_150', 'dod2k_composite_z', 'CoralHydro2k v1.0.1', 'Ningaloo Reef, West Australia', 'Ningaloo Reef, Australia', '-21.905000686645508', '-21.905000686645508', '113.96499633789062', '113.96499633789062', '-3.0', '-3.0', 'Coral', 'Coral', 'd18O', 'd18O', 'iso2k_1554: https://www.ncdc.noaa.gov/paleo/study/1867, iso2k_1556: https://www.ncdc.noaa.gov/paleo/study/1867', 'https://www.ncdc.noaa.gov/paleo/study/1867', '1878.7-1995.2', '1878.7-1995.2', 'KEEP', 'KEEP', 'MANUAL', ''], ['37', '44', 'https://nzero.umd.edu:444/hub/user-redirect/lab/tree/compile_proxy_database_v2.1/tmp/dup_detection/020_dod2k_composite_z_iso2k_1554_iso2k_1556_iso2k_1554__37_44.jpg', 'dod2k_composite_z_iso2k_1554_iso2k_1556', 'iso2k_1554', 'dod2k_composite_z', 'Iso2k v1.1.2', 'Ningaloo Reef, West Australia', 'Ningaloo Reef, West Australia', '-21.905000686645508', '-21.905000686645508', '113.96499633789062', '113.96499633789062', '-3.0', '-3.0', 'Coral', 'Coral', 'd18O', 'd18O', 'iso2k_1554: https://www.ncdc.noaa.gov/paleo/study/1867, iso2k_1556: https://www.ncdc.noaa.gov/paleo/study/1867', 'https://www.ncdc.noaa.gov/paleo/study/1867', '1878.7-1995.2', '1878.7-1995.2', 'REMOVE', 'REMOVE', 'MANUAL', ''], ['37', '45', 'https://nzero.umd.edu:444/hub/user-redirect/lab/tree/compile_proxy_database_v2.1/tmp/dup_detection/021_dod2k_composite_z_iso2k_1554_iso2k_1556_ch2k_KU00NIN01_150__37_45.jpg', 'dod2k_composite_z_iso2k_1554_iso2k_1556', 'ch2k_KU00NIN01_150', 'dod2k_composite_z', 'CoralHydro2k v1.0.1', 'Ningaloo Reef, West Australia', 'Ningaloo Reef, Australia', '-21.905000686645508', '-21.905000686645508', '113.96499633789062', '113.96499633789062', '-3.0', '-3.0', 'Coral', 'Coral', 'd18O', 'd18O', 'iso2k_1554: https://www.ncdc.noaa.gov/paleo/study/1867, iso2k_1556: https://www.ncdc.noaa.gov/paleo/study/1867', 'https://www.ncdc.noaa.gov/paleo/study/1867', '1878.7-1995.2', '1878.7-1995.2', 'KEEP', 'REMOVE', 'MANUAL', ''], ['37', '46', 'https://nzero.umd.edu:444/hub/user-redirect/lab/tree/compile_proxy_database_v2.1/tmp/dup_detection/022_dod2k_composite_z_iso2k_1554_iso2k_1556_iso2k_1554__37_46.jpg', 'dod2k_composite_z_iso2k_1554_iso2k_1556', 'iso2k_1554', 'dod2k_composite_z', 'Iso2k v1.1.2', 'Ningaloo Reef, West Australia', 'Ningaloo Reef, West Australia', '-21.905000686645508', '-21.905000686645508', '113.96499633789062', '113.96499633789062', '-3.0', '-3.0', 'Coral', 'Coral', 'd18O', 'd18O', 'iso2k_1554: https://www.ncdc.noaa.gov/paleo/study/1867, iso2k_1556: https://www.ncdc.noaa.gov/paleo/study/1867', 'https://www.ncdc.noaa.gov/paleo/study/1867', '1878.7-1995.2', '1878.7-1995.2', 'KEEP', 'REMOVE', 'MANUAL', ''], ['39', '40', 'https://nzero.umd.edu:444/hub/user-redirect/lab/tree/compile_proxy_database_v2.1/tmp/dup_detection/023_pages2k_1089_pages2k_1089__39_40.jpg', 'pages2k_1089', 'pages2k_1089', 'PAGES 2k v2.2.0', 'PAGES 2k v2.2.0', 'Yellow Mountain Ridge', 'Yellow Mountain Ridge', '45.29999923706055', '45.29999923706055', '-111.30000305175781', '-111.30000305175781', '2500.0', '2500.0', 'Wood', 'Wood', 'ring width', 'ring width', 'https://www.ncdc.noaa.gov/paleo/study/3739', 'https://www.ncdc.noaa.gov/paleo/study/3739', '470.0-1998.0', '470.0-1998.0', 'KEEP', 'REMOVE', 'AUTO: IDENTICAL', 'RECORDS IDENTICAL (identical data). Automatically choose #1.'], ['43', '44', 'https://nzero.umd.edu:444/hub/user-redirect/lab/tree/compile_proxy_database_v2.1/tmp/dup_detection/024_ch2k_KU00NIN01_150_iso2k_1554__43_44.jpg', 'ch2k_KU00NIN01_150', 'iso2k_1554', 'CoralHydro2k v1.0.1', 'Iso2k v1.1.2', 'Ningaloo Reef, Australia', 'Ningaloo Reef, West Australia', '-21.905000686645508', '-21.905000686645508', '113.96499633789062', '113.96499633789062', '-3.0', '-3.0', 'Coral', 'Coral', 'd18O', 'd18O', 'https://www.ncdc.noaa.gov/paleo/study/1867', 'https://www.ncdc.noaa.gov/paleo/study/1867', '1878.7-1995.2', '1878.7-1995.2', 'REMOVE', 'KEEP', 'AUTO: IDENTICAL except for URLs and/or geo_siteName.', 'RECORDS IDENTICAL (perfect correlation) except for metadata. Automatically choose #2.'], ['43', '45', 'https://nzero.umd.edu:444/hub/user-redirect/lab/tree/compile_proxy_database_v2.1/tmp/dup_detection/025_ch2k_KU00NIN01_150_ch2k_KU00NIN01_150__43_45.jpg', 'ch2k_KU00NIN01_150', 'ch2k_KU00NIN01_150', 'CoralHydro2k v1.0.1', 'CoralHydro2k v1.0.1', 'Ningaloo Reef, Australia', 'Ningaloo Reef, Australia', '-21.905000686645508', '-21.905000686645508', '113.96499633789062', '113.96499633789062', '-3.0', '-3.0', 'Coral', 'Coral', 'd18O', 'd18O', 'https://www.ncdc.noaa.gov/paleo/study/1867', 'https://www.ncdc.noaa.gov/paleo/study/1867', '1878.7-1995.2', '1878.7-1995.2', 'KEEP', 'REMOVE', 'AUTO: IDENTICAL', 'RECORDS IDENTICAL (identical data). Automatically choose #1.'], ['43', '46', 'https://nzero.umd.edu:444/hub/user-redirect/lab/tree/compile_proxy_database_v2.1/tmp/dup_detection/026_ch2k_KU00NIN01_150_iso2k_1554__43_46.jpg', 'ch2k_KU00NIN01_150', 'iso2k_1554', 'CoralHydro2k v1.0.1', 'Iso2k v1.1.2', 'Ningaloo Reef, Australia', 'Ningaloo Reef, West Australia', '-21.905000686645508', '-21.905000686645508', '113.96499633789062', '113.96499633789062', '-3.0', '-3.0', 'Coral', 'Coral', 'd18O', 'd18O', 'https://www.ncdc.noaa.gov/paleo/study/1867', 'https://www.ncdc.noaa.gov/paleo/study/1867', '1878.7-1995.2', '1878.7-1995.2', 'REMOVE', 'KEEP', 'AUTO: IDENTICAL except for URLs and/or geo_siteName.', 'RECORDS IDENTICAL (perfect correlation) except for metadata. Automatically choose #2.'], ['44', '45', 'https://nzero.umd.edu:444/hub/user-redirect/lab/tree/compile_proxy_database_v2.1/tmp/dup_detection/027_iso2k_1554_ch2k_KU00NIN01_150__44_45.jpg', 'iso2k_1554', 'ch2k_KU00NIN01_150', 'Iso2k v1.1.2', 'CoralHydro2k v1.0.1', 'Ningaloo Reef, West Australia', 'Ningaloo Reef, Australia', '-21.905000686645508', '-21.905000686645508', '113.96499633789062', '113.96499633789062', '-3.0', '-3.0', 'Coral', 'Coral', 'd18O', 'd18O', 'https://www.ncdc.noaa.gov/paleo/study/1867', 'https://www.ncdc.noaa.gov/paleo/study/1867', '1878.7-1995.2', '1878.7-1995.2', 'KEEP', 'REMOVE', 'AUTO: IDENTICAL except for URLs and/or geo_siteName.', 'RECORDS IDENTICAL (perfect correlation) except for metadata. Automatically choose #1.'], ['44', '46', 'https://nzero.umd.edu:444/hub/user-redirect/lab/tree/compile_proxy_database_v2.1/tmp/dup_detection/028_iso2k_1554_iso2k_1554__44_46.jpg', 'iso2k_1554', 'iso2k_1554', 'Iso2k v1.1.2', 'Iso2k v1.1.2', 'Ningaloo Reef, West Australia', 'Ningaloo Reef, West Australia', '-21.905000686645508', '-21.905000686645508', '113.96499633789062', '113.96499633789062', '-3.0', '-3.0', 'Coral', 'Coral', 'd18O', 'd18O', 'https://www.ncdc.noaa.gov/paleo/study/1867', 'https://www.ncdc.noaa.gov/paleo/study/1867', '1878.7-1995.2', '1878.7-1995.2', 'KEEP', 'REMOVE', 'AUTO: IDENTICAL', 'RECORDS IDENTICAL (identical data). Automatically choose #1.'], ['45', '46', 'https://nzero.umd.edu:444/hub/user-redirect/lab/tree/compile_proxy_database_v2.1/tmp/dup_detection/029_ch2k_KU00NIN01_150_iso2k_1554__45_46.jpg', 'ch2k_KU00NIN01_150', 'iso2k_1554', 'CoralHydro2k v1.0.1', 'Iso2k v1.1.2', 'Ningaloo Reef, Australia', 'Ningaloo Reef, West Australia', '-21.905000686645508', '-21.905000686645508', '113.96499633789062', '113.96499633789062', '-3.0', '-3.0', 'Coral', 'Coral', 'd18O', 'd18O', 'https://www.ncdc.noaa.gov/paleo/study/1867', 'https://www.ncdc.noaa.gov/paleo/study/1867', '1878.7-1995.2', '1878.7-1995.2', 'REMOVE', 'KEEP', 'AUTO: IDENTICAL except for URLs and/or geo_siteName.', 'RECORDS IDENTICAL (perfect correlation) except for metadata. Automatically choose #2.']] start with index: 30 ===================================================================== END OF DUPLICATE DECISION PROCESS. =====================================================================
(30, 27) Saved the decisions under data/tmp/dup_detection/dup_decisions_tmp_LL_25-11-11.csv Summary of all decisions made: #0: KEEP record dod2k_composite_z_pages2k_468_pages2k_3550. REMOVE record dod2k_composite_z_pages2k_3550_FE23_asia_russ137w. #1: KEEP record dod2k_composite_z_pages2k_468_pages2k_3550. REMOVE record pages2k_468. #2: KEEP record dod2k_composite_z_pages2k_2795_pages2k_2798. REMOVE record pages2k_2793. #3: KEEP record dod2k_composite_z_pages2k_2795_pages2k_2798. KEEP record pages2k_2796. #4: KEEP record dod2k_composite_z_pages2k_3085_iso2k_1556. REMOVE record dod2k_composite_z_ch2k_KU00NIN01_150_iso2k_1556. #5: KEEP record dod2k_composite_z_pages2k_3085_iso2k_1556. REMOVE record dod2k_composite_z_iso2k_1554_iso2k_1556. #6: COMPOSITE record dod2k_composite_z_pages2k_3085_iso2k_1556. COMPOSITE record ch2k_KU00NIN01_150. #7: KEEP record dod2k_composite_z_pages2k_3085_iso2k_1556. REMOVE record iso2k_1554. #8: KEEP record dod2k_composite_z_pages2k_3085_iso2k_1556. REMOVE record ch2k_KU00NIN01_150. #9: KEEP record dod2k_composite_z_pages2k_3085_iso2k_1556. REMOVE record iso2k_1554. #10: KEEP record dod2k_composite_z_pages2k_3550_FE23_asia_russ137w. REMOVE record pages2k_468. #11: KEEP record dod2k_composite_z_ch2k_KU99HOU01_40_iso2k_788. REMOVE record dod2k_composite_z_iso2k_786_iso2k_788. #12: KEEP record dod2k_composite_z_ch2k_KU99HOU01_40_iso2k_788. REMOVE record iso2k_786. #13: KEEP record dod2k_composite_z_ch2k_KU00NIN01_150_iso2k_1556. REMOVE record dod2k_composite_z_iso2k_1554_iso2k_1556. #14: KEEP record dod2k_composite_z_ch2k_KU00NIN01_150_iso2k_1556. REMOVE record ch2k_KU00NIN01_150. #15: KEEP record dod2k_composite_z_ch2k_KU00NIN01_150_iso2k_1556. REMOVE record iso2k_1554. #16: COMPOSITE record dod2k_composite_z_ch2k_KU00NIN01_150_iso2k_1556. COMPOSITE record ch2k_KU00NIN01_150. #17: COMPOSITE record dod2k_composite_z_ch2k_KU00NIN01_150_iso2k_1556. COMPOSITE record iso2k_1554. #18: KEEP record dod2k_composite_z_iso2k_786_iso2k_788. REMOVE record iso2k_786. #19: KEEP record dod2k_composite_z_iso2k_1554_iso2k_1556. KEEP record ch2k_KU00NIN01_150. #20: REMOVE record dod2k_composite_z_iso2k_1554_iso2k_1556. REMOVE record iso2k_1554. #21: KEEP record dod2k_composite_z_iso2k_1554_iso2k_1556. REMOVE record ch2k_KU00NIN01_150. #22: KEEP record dod2k_composite_z_iso2k_1554_iso2k_1556. REMOVE record iso2k_1554. #23: KEEP record pages2k_1089. REMOVE record pages2k_1089. #24: REMOVE record ch2k_KU00NIN01_150. KEEP record iso2k_1554. #25: KEEP record ch2k_KU00NIN01_150. REMOVE record ch2k_KU00NIN01_150. #26: REMOVE record ch2k_KU00NIN01_150. KEEP record iso2k_1554. #27: KEEP record iso2k_1554. REMOVE record ch2k_KU00NIN01_150. #28: KEEP record iso2k_1554. REMOVE record iso2k_1554. #29: REMOVE record ch2k_KU00NIN01_150. KEEP record iso2k_1554.
Implement the decisions.
tmp_df_decisions = pd.read_csv(f'data/{df_check.name}/dup_detection/dup_decisions_{df_check.name}_{initials}_{date}'+'.csv', header=5)
tmp_dup_details = dup.provide_dup_details(tmp_df_decisions, header)
#drop all REMOVE or COMPOSITE types
tmp_remove_IDs = list(tmp_df_decisions['datasetId 1'][np.isin(tmp_df_decisions['Decision 1'],['REMOVE', 'COMPOSITE'])])
tmp_remove_IDs += list(tmp_df_decisions['datasetId 2'][np.isin(tmp_df_decisions['Decision 2'],['REMOVE', 'COMPOSITE'])])
tmp_remove_IDs = [id for id in np.unique(tmp_remove_IDs) if id not in remove_IDs]
tmp_df_dupfree_rmv = df_duprmv_cmp.drop(tmp_remove_IDs) # df freed from 'REMOVE' type duplicates
# # composite the
tmp_comp_ID_pairs = tmp_df_decisions[(tmp_df_decisions['Decision 1']=='COMPOSITE')&(tmp_df_decisions['Decision 2']=='COMPOSITE')]
# # create new composite data and metadata from the pairs
# # loop through the composite pairs and check metadata
tmp_df_composite = dup.join_composites_metadata(df_duprmv_cmp, comp_ID_pairs, df_decisions, header)
4. Create duplicate free dataframe¶
Merge df_composite and df_dupfree_rmv to create duplicate free dataframe.
# the final duplicate free dataframe is the joined data from (1) df_dupfree_rmv, (2)
df_dupfree = pd.concat([tmp_df_dupfree_rmv, tmp_df_composite])
# removed or composited duplicates from (1) df_duplica_rmv, (2)
df_duplica_rmv = pd.concat([df_dupfree_rmv, tmp_df_dupfree_rmv, df_composite, tmp_df_composite])
print(df_dupfree.info())
<class 'pandas.core.frame.DataFrame'> Index: 5438 entries, pages2k_5 to dod2k_composite_z_ch2k_KU99HOU01_40_iso2k_788 Data columns (total 22 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 archiveType 5438 non-null object 1 dataSetName 5438 non-null object 2 geo_meanElev 5356 non-null float32 3 geo_meanLat 5438 non-null float32 4 geo_meanLon 5438 non-null float32 5 geo_siteName 5438 non-null object 6 interpretation_direction 5438 non-null object 7 interpretation_seasonality 5438 non-null object 8 interpretation_variable 5438 non-null object 9 interpretation_variableDetail 5438 non-null object 10 originalDataURL 5438 non-null object 11 originalDatabase 5438 non-null object 12 paleoData_notes 5438 non-null object 13 paleoData_proxy 5438 non-null object 14 paleoData_sensorSpecies 5405 non-null object 15 paleoData_units 5438 non-null object 16 paleoData_values 5438 non-null object 17 paleoData_variableName 5405 non-null object 18 year 5438 non-null object 19 yearUnits 5438 non-null object 20 datasetId 5438 non-null object 21 duplicateDetails 5438 non-null object dtypes: float32(3), object(19) memory usage: 913.4+ KB None
Save duplicate free dataframe¶
Sort the columns and assign a name to the dataframe which is used for saving purposes (determines directory and filename). Make sure that date and operator initials initials are used in the name.
df_dupfree = df_dupfree[sorted(df_dupfree.columns)]
df_dupfree.name =f'{df.name}_{initials}_{date}_dupfree'
os.makedirs(f'data/{df_dupfree.name}/', exist_ok=True)
df_dupfree.info()
print(df_dupfree.name)
<class 'pandas.core.frame.DataFrame'> Index: 5438 entries, pages2k_5 to dod2k_composite_z_ch2k_KU99HOU01_40_iso2k_788 Data columns (total 22 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 archiveType 5438 non-null object 1 dataSetName 5438 non-null object 2 datasetId 5438 non-null object 3 duplicateDetails 5438 non-null object 4 geo_meanElev 5356 non-null float32 5 geo_meanLat 5438 non-null float32 6 geo_meanLon 5438 non-null float32 7 geo_siteName 5438 non-null object 8 interpretation_direction 5438 non-null object 9 interpretation_seasonality 5438 non-null object 10 interpretation_variable 5438 non-null object 11 interpretation_variableDetail 5438 non-null object 12 originalDataURL 5438 non-null object 13 originalDatabase 5438 non-null object 14 paleoData_notes 5438 non-null object 15 paleoData_proxy 5438 non-null object 16 paleoData_sensorSpecies 5405 non-null object 17 paleoData_units 5438 non-null object 18 paleoData_values 5438 non-null object 19 paleoData_variableName 5405 non-null object 20 year 5438 non-null object 21 yearUnits 5438 non-null object dtypes: float32(3), object(19) memory usage: 913.4+ KB all_merged_LL_25-11-11_dupfree
save pickle¶
# save concatenate dataframe as db_merged
df_dupfree.to_pickle(f'data/{df_dupfree.name}/{df_dupfree.name}_compact.pkl')
save csv¶
# save to a list of csv files (metadata, data, year)
utf.write_compact_dataframe_to_csv(df_dupfree)
METADATA: datasetId, archiveType, dataSetName, duplicateDetails, geo_meanElev, geo_meanLat, geo_meanLon, geo_siteName, interpretation_direction, interpretation_seasonality, interpretation_variable, interpretation_variableDetail, originalDataURL, originalDatabase, paleoData_notes, paleoData_proxy, paleoData_sensorSpecies, paleoData_units, paleoData_variableName, yearUnits Saved to /home/jupyter-lluecke/dod2k_v2.0/dod2k/data/all_merged_LL_25-11-11_dupfree/all_merged_LL_25-11-11_dupfree_compact_%s.csv
# load dataframe
print(utf.load_compact_dataframe_from_csv(df_dupfree.name).info())
<class 'pandas.core.frame.DataFrame'> RangeIndex: 5438 entries, 0 to 5437 Data columns (total 22 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 archiveType 5438 non-null object 1 dataSetName 5438 non-null object 2 datasetId 5438 non-null object 3 duplicateDetails 5438 non-null object 4 geo_meanElev 5356 non-null float32 5 geo_meanLat 5438 non-null float32 6 geo_meanLon 5438 non-null float32 7 geo_siteName 5438 non-null object 8 interpretation_direction 5438 non-null object 9 interpretation_seasonality 5438 non-null object 10 interpretation_variable 5438 non-null object 11 interpretation_variableDetail 5438 non-null object 12 originalDataURL 5438 non-null object 13 originalDatabase 5438 non-null object 14 paleoData_notes 5438 non-null object 15 paleoData_proxy 5438 non-null object 16 paleoData_sensorSpecies 5438 non-null object 17 paleoData_units 5438 non-null object 18 paleoData_values 5438 non-null object 19 paleoData_variableName 5438 non-null object 20 year 5438 non-null object 21 yearUnits 5438 non-null object dtypes: float32(3), object(19) memory usage: 871.1+ KB None
# write header with operator information as README txt file
file = open(f'data/{df_dupfree.name}/{df_dupfree.name}_dupfree_README.txt', 'w')
for line in header:
file.write(line+'\n')
file.close()
fn = utf.find(df_dupfree.name, f'data/{df_dupfree.name}')
print(fn)
if fn != []:
print('----------------------------------------------------')
print('Sucessfully finished the duplicate finalising process!'.upper())
print('----------------------------------------------------')
print('Saved the final output files in:')
print()
for ff in fn:
print(' '+os.getcwd()+'/%s.'%ff)
print()
print('The duplicate detection process is now finished and the duplicate free database is ready to use!')
else:
print('!!!!!!!!!!!!WARNING!!!!!!!!!!!')
print(f'Final output file is missing at data/{df_dupfree.name}.')
print()
print('Please re-run the notebook to complete duplicate finalising process.')
['data/all_merged_LL_25-11-11_dupfree/all_merged_LL_25-11-11_dupfree_compact.pkl', 'data/all_merged_LL_25-11-11_dupfree/all_merged_LL_25-11-11_dupfree_dupfree_README.txt', 'data/all_merged_LL_25-11-11_dupfree/all_merged_LL_25-11-11_dupfree_compact_metadata.csv', 'data/all_merged_LL_25-11-11_dupfree/all_merged_LL_25-11-11_dupfree_compact_year.csv', 'data/all_merged_LL_25-11-11_dupfree/all_merged_LL_25-11-11_dupfree_compact_paleoData_values.csv'] ---------------------------------------------------- SUCESSFULLY FINISHED THE DUPLICATE FINALISING PROCESS! ---------------------------------------------------- Saved the final output files in: /home/jupyter-lluecke/dod2k_v2.0/dod2k/data/all_merged_LL_25-11-11_dupfree/all_merged_LL_25-11-11_dupfree_compact.pkl. /home/jupyter-lluecke/dod2k_v2.0/dod2k/data/all_merged_LL_25-11-11_dupfree/all_merged_LL_25-11-11_dupfree_dupfree_README.txt. /home/jupyter-lluecke/dod2k_v2.0/dod2k/data/all_merged_LL_25-11-11_dupfree/all_merged_LL_25-11-11_dupfree_compact_metadata.csv. /home/jupyter-lluecke/dod2k_v2.0/dod2k/data/all_merged_LL_25-11-11_dupfree/all_merged_LL_25-11-11_dupfree_compact_year.csv. /home/jupyter-lluecke/dod2k_v2.0/dod2k/data/all_merged_LL_25-11-11_dupfree/all_merged_LL_25-11-11_dupfree_compact_paleoData_values.csv. The duplicate detection process is now finished and the duplicate free database is ready to use!
Summary and summary plots of datasets¶
Import plotting libraries
import matplotlib.pyplot as plt
from matplotlib.gridspec import GridSpec as GS
import cartopy.crs as ccrs
import cartopy.feature as cfeature
from dod2k_utilities import ut_plot as uplt # contains plotting functions
#%% print some info about the data
db_types = df_dupfree_rmv['originalDatabase'].unique()
col = uplt.get_colours(range(len(db_types)), 'tab10', 0, len(db_types))
#col = ['tab:blue','tab:green', 'tab:grey', 'tab:pink', 'tab:orange']
counts = []
ticks = []
colours = []
for ii, db in enumerate(db_types):
cc = df_dupfree_rmv['originalDatabase'][(df_dupfree_rmv['originalDatabase']==db)].count()
counts += [cc]
ticks += [db.split('(Ocn_103')[0]]
colours += [col[ii]]
# plot a bar chart of the number of proxy types included in the dataset
fig = plt.figure(figsize=(8,4), dpi=200)
ax = plt.gca()
plt.bar(range(len(ticks)), counts, color=colours)
plt.xlabel('database')
plt.ylabel('count')
ax.set_xticks(range(len(ticks)), ticks, rotation=45, ha='right')
#ax.set_xticklabels(proxy_types, rotation=45, ha='right')
plt.title('original database')
plt.show()
fig.tight_layout()
utf.figsave(fig, 'SF_removed_recs_barchart_databases', add='%s/'%df_dupfree.name)
saved figure in /figs/all_merged_LL_25-11-11_dupfree//SF_removed_recs_barchart_databases.pdf
#%% print some info about the data
proxy_types = df_dupfree_rmv['paleoData_proxy'].unique()
archive_types = df_dupfree_rmv['archiveType'].unique()
print(proxy_types)
print(archive_types)
col = uplt.get_colours(range(0,len(archive_types)), 'Accent', -1, len(archive_types))
counts = []
ticks = []
colours = []
for ii, at in enumerate(archive_types):
proxy_types = df_dupfree_rmv['paleoData_proxy'][df_dupfree_rmv['archiveType']==at].unique()
for pt in proxy_types:
cc = df_dupfree_rmv['paleoData_proxy'][(df_dupfree_rmv['paleoData_proxy']==pt)&(df_dupfree_rmv['archiveType']==at)].count()
# print('%25s'%pt+': '+str(cc))
counts += [cc]
ticks += [at+': '+pt]
colours += [col[ii]]
['ring width' 'residualChronology' 'ARSTAN' 'RBAR' 'core' 'maximum latewood density' 'reflectance' 'EPS' 'd18O' 'd13C' 'Sr/Ca' 'Mg/Ca' 'temperature' 'historical' 'varve thickness' 'ice melt' 'alkenone' 'chironomid' 'Uk37' 'borehole' 'pollen' 'depth' 'dinocyst' 'count' 'concentration' 'chrysophyte assemblage' 'dD' 'calcification rate' 'depthTop' 'depthBottom' 'foraminifera' 'BSi' 'dust' 'chloride' 'sulfate' 'nitrate' 'thickness' 'duration' 'TEX86' 'effectivePrecipitation' 'diatom' 'multiproxy' 'humidificationIndex' 'accumulation rate' 'sodium' 'growth rate'] ['Wood' 'Coral' 'LakeSediment' 'MarineSediment' 'Documents' 'GlacierIce' 'Borehole' 'Sclerosponge' 'Speleothem' 'Other' 'GroundIce' 'MolluskShell' 'speleothem']
# plot a bar chart of the number of proxy types included in the dataset
fig = plt.figure(figsize=(8, 4), dpi=200)
ax = plt.gca()
plt.bar(range(len(ticks)), counts, color=colours)
plt.xlabel('proxy type')
plt.ylabel('count')
ax.set_xticks(range(len(ticks)), ticks, rotation=45, ha='right')
#ax.set_xticklabels(proxy_types, rotation=45, ha='right')
plt.title('removed proxy types')
plt.show()
fig.tight_layout()
utf.figsave(fig, 'SF_removed_recs_barchart_proxytypes', add='%s/'%df_dupfree.name)
saved figure in /figs/all_merged_LL_25-11-11_dupfree//SF_removed_recs_barchart_proxytypes.pdf
#%% plot the spatial distribution of the removeed records
proxy_lats = df_dupfree_rmv['geo_meanLat'].values
proxy_lons = df_dupfree_rmv['geo_meanLon'].values
# plots the map
fig = plt.figure(figsize=(10, 5), dpi=200)
grid = GS(1, 3)
ax = plt.subplot(grid[:, -2:], projection=ccrs.Robinson()) # create axis with Robinson projection of globe
ax.stock_img()
ax.add_feature(cfeature.LAND) # adds land features
ax.coastlines() # adds coastline features
mt = 'ov^<>pP*XDd'*10 # generates string of marker types
archive_marker = {aa: mm for aa, mm in zip(archive_types, mt)} # attributes marker type to each archive type
archive_colour = {aa: cc for aa, cc in zip(archive_types, col)}
# loop through the data to generate a scatter plot of each data record:
# 1st loop: go through archive types individually (determines marker type)
# 2nd loop: through paleo proxy types attributed to the specific archive, which is colour coded
for jj, at in enumerate(archive_types):
arch_mask = df_dupfree_rmv['archiveType']==at
arch_proxy_types = np.unique(df_dupfree_rmv['paleoData_proxy'][arch_mask])
for ii, pt in enumerate(arch_proxy_types):
pt_mask = df_dupfree_rmv['paleoData_proxy']==pt
at_mask = df_dupfree_rmv['archiveType']==at
plt.scatter(proxy_lons[pt_mask&at_mask], proxy_lats[pt_mask&at_mask],
transform=ccrs.PlateCarree(), zorder=999,
marker=mt[ii], color=archive_colour[at],
label=at+': '+pt+' ($n=%d$)'% df_dupfree_rmv['paleoData_proxy'][(df_dupfree_rmv['paleoData_proxy']==pt)&(df_dupfree_rmv['archiveType']==at)].count(),
lw=.5, ec='k')
plt.title('removed proxy types')
plt.legend(bbox_to_anchor=(0.03,1.1), ncol=2, fontsize=9, framealpha=0)
grid.tight_layout(fig)
utf.figsave(fig, 'SF_removed_spatial', add='%s/'%df_dupfree.name)
/tmp/ipykernel_703226/3804216409.py:36: UserWarning: Tight layout not applied. tight_layout cannot make axes width small enough to accommodate all axes decorations grid.tight_layout(fig)
saved figure in /figs/all_merged_LL_25-11-11_dupfree//SF_removed_spatial.pdf